[Bug c/92436] New: SIMD integer subtract with constant always becomes add
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92436 Bug ID: 92436 Summary: SIMD integer subtract with constant always becomes add Product: gcc Version: 9.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: zingaburga+gcc at hotmail dot com Target Milestone: --- Firstly, this isn't a bug, rather a missed optimization opportunity (I presume this is the place to post these?). With the optimizer enabled, it seems like SIMD integer subtract, with a constant, always gets turned into a SIMD integer add with the constant negated. For a target like x86 SSE, I suppose this may make sense, as the commutative property of addition gives more flexibility around register placement, but it isn't always beneficial - for example, if the constant could be re-used elsewhere. Example (x86): _mm_or_si128( _mm_sub_epi8(a, _mm_set1_epi8(99)), _mm_set1_epi8(99) ); In this case, the '99' constant can be used in both the subtract and or, but GCC will always convert the first use to a '-99' constant, meaning that it now has to deal with two constants: https://godbolt.org/z/gaKAkA This can have a greater effect when the constants are held in registers, as the negated constant wastes a register, which can sometimes cause otherwise unnecessary register spilling elsewhere. The behavior persists with AVX enabled, and I've even seen it for ARM NEON: https://godbolt.org/z/z3b5mq --- Perhaps a different issue, but maybe related: I noticed that switching the order of the arguments for subtract, GCC seems to think the two constants are different, even though this is not the case: https://godbolt.org/z/6fGhGd For this second example ((99-a)|99), I'd have thought the more appropriate assembly to be something like: vmovdqa xmm1, XMMWORD PTR .LC0[rip] vpsubb xmm0, xmm1, xmm0 vporxmm0, xmm0, xmm1
[Bug target/92437] New: Unnecessary register duplication of vector constant in x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92437 Bug ID: 92437 Summary: Unnecessary register duplication of vector constant in x86 Product: gcc Version: 9.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: zingaburga+gcc at hotmail dot com Target Milestone: --- Consider the following code example: #include void fn(__m128i* in, __m128i* out) { int i=0; const __m128i num = _mm_set1_epi8(99); while(i<100) { __m128i a = in[i]; __m128i b = _mm_add_epi8(a, num); if(_mm_movemask_epi8(b)) a = _mm_or_si128(a, num); if(_mm_movemask_epi8(a)) a = _mm_or_si128(a, num); out[i] = a; i++; } } The vector `num` is referenced 3 times in the loop, and GCC seems to load it into 3 separate registers, when 1 would suffice: https://godbolt.org/z/mP22ez (in this link, the `99` vector is held in xmm2, xmm3 and xmm4). This seems to be the case regardless of AVX being enabled or not. I don't really get what a possible cause for this is, but it seems that the `if` conditions are necessary to trigger this effect.
[Bug target/92437] Unnecessary register duplication of vector constant in x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92437 --- Comment #2 from zingaburga+gcc at hotmail dot com --- Thanks for the info Andrew! Changing the add to `_mm_add_epi64` does seem to eliminate all instances of the duplication.
[Bug target/114069] New: Type punning RISC-V vectors causes ICE at -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114069 Bug ID: 114069 Summary: Type punning RISC-V vectors causes ICE at -O1 Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: zingaburga+gcc at hotmail dot com Target Milestone: --- Type punning a RISC-V vector causes ICE under RV64 GCC 13.x/trunk: https://godbolt.org/z/sajcb3T7z Seems to work with -O0 instead of -O1, on GCC 13.x Code: #include vbool8_t f(vuint8m1_t s) { // unavailable in GCC 13, available in trunk //return __riscv_vreinterpret_v_u8m1_b8(s); // causes ICE in GCC 13 + trunk return *reinterpret_cast(&s); // this seems to work without issue vuint8mf8_t f = __riscv_vlmul_trunc_v_u8m1_u8mf8(s); return *reinterpret_cast(&f); } Compiler options: -march=rv64gcv -O1 Output: during RTL pass: expand : In function 'vbool8_t f(vuint8m1_t)': :8:47: internal compiler error: in convert_move, at expr.cc:219 8 | return *reinterpret_cast(&s); | ^ 0x7fb7d5029e3f __libc_start_main ???:0 Please submit a full bug report, with preprocessed source. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. The bug is not reproducible, so it is likely a hardware or OS problem. Compiler returned: 1