[Bug c++/96709] New: cmov and vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96709 Bug ID: 96709 Summary: cmov and vectorization Product: gcc Version: unknown URL: https://godbolt.org/z/GKnj17 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Hello gcc team, I noticed 2 problems: 1) the compiler does not generate cmov commands 2) the auto-vectorization is very unreliable I would like to clarify this using the example of a stable shift-left, see https://godbolt.org/z/GKnj17 I have implemented several variants for this. to 1) Only silent::conditional_move generates a cmov, all other cases do not. to 2) - The auto-vectorization only works if the smaller of the two arrays (val and bit) is at least as large as an sse register, although the values could be adjusted. - If vectorization is used at all, often only 128-bit code is generated (_mm_XXX) instead of 256-bit (avx _mm256_XXX) or larger. - The 16-bit shift commands from AVX512 (_mmXXX_sllv_epi16) are not used if a suitable architecture is selected. The complex shl_attempt_vectorize function works a little better, but not 100% either. Play around with the array size, the value/shift-types and the functions! Best regards Gero
[Bug target/96709] cmov and vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96709 --- Comment #2 from g.peterh...@t-online.de --- You can choose the boost version on godbolt.org. The example uses 1.73, but only the macros #define BOOST_FORCEINLINE inline __attribute__ ((__always_inline__)) and #define BOOST_NOINLINE __attribute__ ((__noinline__)).
[Bug target/90492] simple array-copy not use available simd-registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90492 g.peterh...@t-online.de changed: What|Removed |Added Known to fail||10.0 --- Comment #6 from g.peterh...@t-online.de --- The bug still contained in gcc 10.0.0 20191210 ?! When can I expect this to be fixed?
[Bug c++/90491] New: simple operation with unsigned integer and conversion to float/double not vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90491 Bug ID: 90491 Summary: simple operation with unsigned integer and conversion to float/double not vectorized Product: gcc Version: 8.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- snip #include #include int main(const int argc, const char** argv) { using value_type = float; // or double using array_type = std::array; // size does not matter array_type a; /* * this loop not vectorized * explicite conversion a[i] = argc + int(i) works */ for (size_t i=0; ihttp://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --with-slibdir=/lib64 --with-system-zlib --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --with-gcc-major-version-only --enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function --program-suffix=-8 --without-system-libunwind --enable-multilib --with-arch-32=x86-64 --with-tune=generic --build=x86_64-suse-linux --host=x86_64-suse-linux Thread model: posix gcc version 8.3.1 20190226 [gcc-8-branch revision 269204] (SUSE Linux)
[Bug c++/90492] New: simple array-copy not use available simd-registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90492 Bug ID: 90492 Summary: simple array-copy not use available simd-registers Product: gcc Version: 8.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- snip #include #include int main(const int argc, const char** argv) { using value_type = int; // type does not matter using array_type = std::array; array_type a, b; // simple init for (size_t i=0; ihttp://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --with-slibdir=/lib64 --with-system-zlib --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --with-gcc-major-version-only --enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function --program-suffix=-8 --without-system-libunwind --enable-multilib --with-arch-32=x86-64 --with-tune=generic --build=x86_64-suse-linux --host=x86_64-suse-linux Thread model: posix gcc version 8.3.1 20190226 [gcc-8-branch revision 269204] (SUSE Linux)
[Bug target/90492] simple array-copy not use available simd-registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90492 --- Comment #3 from g.peterh...@t-online.de --- Am 15.05.19 um 21:20 schrieb glisse at gcc dot gnu.org: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90492 > > --- Comment #1 from Marc Glisse --- >> copy's use only sse-registers and never higher > > What do you mean by that? Do you want AVX? Then you should let the compiler > know that they are available (for instance -march=native). > Yes, i'm use -march=native on Ryzen 7 2700 (has avx/avx2) or you compile with -march=skylake-avx512, but copy-operations use only sse-registers in all cases.
[Bug target/90492] simple array-copy not use available simd-registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90492 --- Comment #4 from g.peterh...@t-online.de --- #include #include int main(const int argc, const char** argv) { using value_type = int64_t; using array_type = std::array; array_type a, b; for (size_t i=0; i: 0: 55 push %rbp 1: 48 89 e5mov%rsp,%rbp 4: 41 54 push %r12 6: 53 push %rbx 7: 48 83 e4 c0 and$0xffc0,%rsp b: 48 8d a4 24 c0 fe fflea-0x140(%rsp),%rsp 12: ff 13: 62 f1 fd 48 6f 05 00vmovdqa64 0x0(%rip),%zmm0# 1d 1a: 00 00 00 19: R_X86_64_PC32 .rodata-0x4 1d: 48 8d 9c 24 c0 00 00lea0xc0(%rsp),%rbx 24: 00 25: 62 f1 fd 48 7f 44 24vmovdqa64 %zmm0,0x40(%rsp) 2c: 01 2d: c5 f9 6f d0 vmovdqa %xmm0,%xmm2 31: 62 f1 fd 48 6f 05 00vmovdqa64 0x0(%rip),%zmm0# 3b 38: 00 00 00 37: R_X86_64_PC32 .rodata+0x3c 3b: 4c 8d a4 24 40 01 00lea0x140(%rsp),%r12 42: 00 43: 62 f1 fd 48 7f 44 24vmovdqa64 %zmm0,0x80(%rsp) 4a: 02 4b: 62 f1 fd 08 6f 5c 24vmovdqa64 0x50(%rsp),%xmm3 52: 05 53: 62 f1 fd 08 6f 64 24vmovdqa64 0x60(%rsp),%xmm4 5a: 06 5b: 62 f1 fd 08 6f 6c 24vmovdqa64 0x70(%rsp),%xmm5 62: 07 63: 62 f1 fd 08 6f 74 24vmovdqa64 0x90(%rsp),%xmm6 6a: 09 6b: 62 f1 fd 08 6f 7c 24vmovdqa64 0xa0(%rsp),%xmm7 72: 0a 73: 62 f1 fd 08 6f 4c 24vmovdqa64 0xb0(%rsp),%xmm1 7a: 0b 7b: 62 f1 fd 08 7f 54 24vmovdqa64 %xmm2,0xc0(%rsp) 82: 0c 83: 62 f1 fd 08 7f 5c 24vmovdqa64 %xmm3,0xd0(%rsp) 8a: 0d 8b: 62 f1 fd 08 7f 64 24vmovdqa64 %xmm4,0xe0(%rsp) 92: 0e 93: 62 f1 fd 08 7f 44 24vmovdqa64 %xmm0,0x100(%rsp) 9a: 10 9b: 62 f1 fd 08 7f 6c 24vmovdqa64 %xmm5,0xf0(%rsp) a2: 0f a3: 62 f1 fd 08 7f 74 24vmovdqa64 %xmm6,0x110(%rsp) aa: 11 ab: 62 f1 fd 08 7f 7c 24vmovdqa64 %xmm7,0x120(%rsp) b2: 12 b3: 62 f1 fd 08 7f 4c 24vmovdqa64 %xmm1,0x130(%rsp) ba: 13 bb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) c0: 48 8b 33mov(%rbx),%rsi c3: bf 00 00 00 00 mov$0x0,%edi c4: R_X86_64_32 std::cout c8: 48 83 c3 08 add$0x8,%rbx cc: e8 00 00 00 00 callq d1 cd: R_X86_64_PLT32 std::ostream& std::ostream::_M_insert(long)-0x4 d1: 48 89 c7mov%rax,%rdi d4: ba 01 00 00 00 mov$0x1,%edx d9: c6 44 24 3f 20 movb $0x20,0x3f(%rsp) de: 48 8d 74 24 3f lea0x3f(%rsp),%rsi e3: e8 00 00 00 00 callq e8 e4: R_X86_64_PLT32 std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)-0x4 e8: 49 39 dccmp%rbx,%r12 eb: 75 d3 jnec0 ed: 48 8d 65 f0 lea-0x10(%rbp),%rsp f1: 31 c0 xor%eax,%eax f3: 5b pop%rbx f4: 41 5c pop%r12 f6: 5d pop%rbp f7: c3 retq f8: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) ff: 00
[Bug tree-optimization/90491] simple operation with unsigned integer and conversion to float/double not vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90491 --- Comment #2 from g.peterh...@t-online.de --- example: #include #include int main(const int argc, const char** argv) { using value_type = float; using array_type = std::array; array_type a; for (size_t i=0; i: 0: 55 push %rbp 1: 48 63 ffmovslq %edi,%rdi 4: 53 push %rbx 5: 48 8d 64 24 a8 lea-0x58(%rsp),%rsp a: 48 85 fftest %rdi,%rdi d: 0f 88 b9 01 00 00 js 1cc 13: c4 e1 fa 2a c7 vcvtsi2ss %rdi,%xmm0,%xmm0 18: c5 fa 11 44 24 10 vmovss %xmm0,0x10(%rsp) 1e: 48 89 f8mov%rdi,%rax 21: 48 83 c0 01 add$0x1,%rax 25: 0f 88 2a 03 00 00 js 355 2b: c4 e1 fa 2a c0 vcvtsi2ss %rax,%xmm0,%xmm0 30: c5 fa 11 44 24 14 vmovss %xmm0,0x14(%rsp) 36: 48 89 f8mov%rdi,%rax 39: 48 83 c0 02 add$0x2,%rax 3d: 0f 88 f8 02 00 00 js 33b 43: c4 e1 fa 2a c0 vcvtsi2ss %rax,%xmm0,%xmm0 48: c5 fa 11 44 24 18 vmovss %xmm0,0x18(%rsp) 4e: 48 89 f8mov%rdi,%rax 51: 48 83 c0 03 add$0x3,%rax 55: 0f 88 c6 02 00 00 js 321 5b: c4 e1 fa 2a c0 vcvtsi2ss %rax,%xmm0,%xmm0 60: c5 fa 11 44 24 1c vmovss %xmm0,0x1c(%rsp) 66: 48 89 f8mov%rdi,%rax 69: 48 83 c0 04 add$0x4,%rax 6d: 0f 88 94 02 00 00 js 307 73: c4 e1 fa 2a c0 vcvtsi2ss %rax,%xmm0,%xmm0 78: c5 fa 11 44 24 20 vmovss %xmm0,0x20(%rsp) 7e: 48 89 f8mov%rdi,%rax 81: 48 83 c0 05 add$0x5,%rax 85: 0f 88 62 02 00 00 js 2ed 8b: c4 e1 fa 2a c0 vcvtsi2ss %rax,%xmm0,%xmm0 90: c5 fa 11 44 24 24 vmovss %xmm0,0x24(%rsp) 96: 48 89 f8mov%rdi,%rax 99: 48 83 c0 06 add$0x6,%rax 9d: 0f 88 30 02 00 00 js 2d3 a3: c4 e1 fa 2a c0 vcvtsi2ss %rax,%xmm0,%xmm0 a8: c5 fa 11 44 24 28 vmovss %xmm0,0x28(%rsp) ae: 48 89 f8mov%rdi,%rax b1: 48 83 c0 07 add$0x7,%rax b5: 0f 88 fe 01 00 00 js 2b9 bb: c4 e1 fa 2a c0 vcvtsi2ss %rax,%xmm0,%xmm0 c0: c5 fa 11 44 24 2c vmovss %xmm0,0x2c(%rsp) c6: 48 89 f8mov%rdi,%rax c9: 48 83 c0 08 add$0x8,%rax cd: 0f 88 cc 01 00 00 js 29f d3: c4 e1 fa 2a c0 vcvtsi2ss %rax,%xmm0,%xmm0 d8: c5 fa 11 44 24 30 vmovss %xmm0,0x30(%rsp) de: 48 89 f8mov%rdi,%rax e1: 48 83 c0 09 add$0x9,%rax e5: 0f 88 9a 01 00 00 js 285 eb: c4 e1 fa 2a c0 vcvtsi2ss %rax,%xmm0,%xmm0 f0: c5 fa 11 44 24 34 vmovss %xmm0,0x34(%rsp) f6: 48 89 f8mov%rdi,%rax f9: 48 83 c0 0a add$0xa,%rax fd: 0f 88 68 01 00 00 js 26b 103: c4 e1 fa 2a c0 vcvtsi2ss %rax,%xmm0,%xmm0 108: c5 fa 11 44 24 38 vmovss %xmm0,0x38(%rsp) 10e: 48 89 f8mov%rdi,%rax 111: 48 83 c0 0b add$0xb,%rax 115: 0f 88 36 01 00 00 js 251 11b: c4 e1 fa 2a c0 vcvtsi2ss %rax,%xmm0,%xmm0 120: c5 fa 11 44 24 3c vmovss %xmm0,0x3c(%rsp) 126: 48 89 f8mov%rdi,%rax 129: 48 83 c0 0c add$0xc,%rax 12d: 0f 88 04 01 00 00 js 237 133: c4 e1 fa 2a c0 vcvtsi2ss %rax,%xmm0,%xmm0 138: c5 fa 11 44 24 40 vmovss %xmm0,0x40(%rsp) 13e: 48 89 f8mov%rdi,%rax 141: 48 83 c0 0d add$0xd,%rax 145: 0f 88 d2 00 00 00 js 21d 14b: c4 e1 fa 2a c0 vcvtsi2ss %rax,%xmm0,%xmm0 150: c5 fa 11 44 24 44 vmovss %xmm0,0x44(%rsp) 156: 48 89 f8mov%rdi,%rax 159: 48 83 c0 0e add$0xe,%rax 15d: 0f 88 a0 00 00 00 js 203 163: c4 e1 fa 2a c0 vcvtsi2ss %rax,%xmm0,%xmm0 168: c5 fa 11 44 24 48 vmovss %xmm0,0x48(%rsp) 16e: 48 83 c7 0f add$0xf,%rdi 172: 78 75 js 1e9 174: c4 e1 fa 2a c7 vcvtsi2ss %rdi,%xmm0,%xmm0 179: c5 fa 11 44 24 4c vmovss %xmm0,0x4c(%rsp) 17f: 48 8d 5c 24 10 lea0x10(%rsp),%rbx 184: 48 8d 6c 24 50 lea0x50(%rsp),%rbp 189: 0f 1f 80 00 00 00 00nopl 0x0(%rax) 190: c5 fa 10 03 vmovss (%rbx),%xmm0 194: bf 00 00 00 00 mov$0x0,%edi 195: R_X86_64_32std::cout 199: c5 fa 5a c0 vcvtss2sd %xmm0,%xmm0,%xmm0 19d: 48 83 c3 04 add$0x4,%rbx 1a1: e8 00 00 00 00 callq 1a6
[Bug target/90600] New: incompatible 64-bit-types in x86-intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90600 Bug ID: 90600 Summary: incompatible 64-bit-types in x86-intrinsics Product: gcc Version: 9.1.1 Status: UNCONFIRMED Keywords: ssemmx Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Host: x86-64 Target: x86-64 Build: 9.1.1 COLLECT_GCC=gcc-9 COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/9/lto-wrapper OFFLOAD_TARGET_NAMES=hsa:nvptx-none Target: x86_64-suse-linux Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,ada,go,d --enable-offload-targets=hsa,nvptx-none=/usr/nvptx-none, --without-cuda-driver --disable-werror --with-gxx-include-dir=/usr/include/c++/9 --enable-ssp --disable-libssp --disable-libvtv --disable-cet --disable-libcc1 --enable-plugin --with-bugurl=https://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --with-slibdir=/lib64 --with-system-zlib --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-libphobos --enable-version-specific-runtime-libs --with-gcc-major-version-only --enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function --program-suffix=-9 --without-system-libunwind --enable-multilib --with-arch-32=x86-64 --with-tune=generic --with-build-config=bootstrap-lto-lean --enable-link-mutex --build=x86_64-suse-linux --host=x86_64-suse-linux Thread model: posix gcc version 9.1.1 20190520 [gcc-9-branch revision 271396] (SUSE Linux) snip 1: using intrinsic_int64_t = decltype(_mm_cvtsi128_si64(__m128i{})); std::cout<)< false true snip 2: uint64_ta, b, r; uint8_t carry; carry = _addcarry_u64(carry, a, b, &r); -> error: invalid conversion from ‘uint64_t*’ {aka ‘long unsigned int*’} to ‘long long unsigned int*’ [-fpermissive] Hello, you're using incompatible 64-bit-types in the x86-intrinsics. Why are not always and everywhere the default-types taken from "types.h"? PS: Is there any hope that the completely outdated (from the 70') short/int/long-types will be completely replaced by u/intX_t (keywords!)? With (signed/unsigned) char != u/int8_t, so that you can write: uint8_t ui = 65; int8_t si = 65; unsigned char uc = 65; signed char sc = 65; charc = 65; std::cout << ui ... -> 65 65 A A A reguards Gero
[Bug target/90600] incompatible 64-bit-types in x86-intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90600 --- Comment #2 from g.peterh...@t-online.de --- Am 23.05.19 um 19:04 schrieb jakub at gcc dot gnu.org: > Note, clang agrees with gcc here, and I don't think it is a good idea to > change > this incompatibly. I think it would be better if there is (on the respective platform) only exactly an absolute type for the significant size and this is consistently used everywhere. Then such problems can not occur at all. PS: I miss the IO-Routines for __int128 (u/int128_t), clang has this. Will they be retrofitted? On 32-bit platforms (and smaller) these types do not exist either. (Can that clang?)
[Bug target/90600] incompatible 64-bit-types in x86-intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90600 --- Comment #4 from g.peterh...@t-online.de --- Am 23.05.19 um 20:11 schrieb glisse at gcc dot gnu.org: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90600 > > --- Comment #3 from Marc Glisse --- > Intel documents that it uses "unsigned __int64" but I don't see where they > document what __int64 is. We could take a "void *out" argument and cast it > inside the function, but that would lose useful diagnostics for people trying > to pass a 32-bit type. We could overload in C++. Not sure any of that is worth > the trouble, those interfaces are target-specific anyway. > What else should "unsigned __int64" be than a uint64_t (0..2^64-1)? Then this would look exactly like this: external __inline uint8_t __attribute __ ((__ gnu_inline__, __always_inline__, __artificial__)) _addcarry_u64 (uint8_t __CF, uint64_t __X, uint64_t __Y, uint64_t * __P) { return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P); } And I miss addcarry/subborrow for uint8/16/128. You could make that available as a general __builtin :-) Of course it would be better if such functions are included in the C/C++ standard ...