[Bug c/97833] New: -Wconversion behaves erratic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97833 Bug ID: 97833 Summary: -Wconversion behaves erratic Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: sven.koehler at gmail dot com Target Milestone: --- Created attachment 49559 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49559&action=edit non-working example Find attached an example for which -Wconversion behaves uncomprehensible. Why does it yields warning for test2 but not for test1 and test3? This happens with gcc for 32bit arm and gcc for x86_64. In all 3 functions, we have 2 shift operations. The operand is uint16_t, which is promoted to int. The result of the shift operations is cast to uint16_t. So the operands of the bit-wise or are again uint16_t. So both operands of the bitwise or are promoted to int. So basically, in all 3 cases the code is returning an int. However, -W conversion warns only in 1 case. Also, why does it matter whether x is shifted by 0 or 1 ? Why does a shift by 0 result in an error, and a shift by 1 does not? Why does it matter whether x and y are originally uint8_t being cast to uint16_t (test2) or a uint16_t (test3) originally? In both cases, the result of the shifts is cast to uint16_t. Is gcc trying to keep track of the range of the individual expressions? Is gcc somehow failing when the a shift by 0 occurs? I believe that a shift by zero is defined behavior.
[Bug c/101950] New: __builtin_clrsb is never inlined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101950 Bug ID: 101950 Summary: __builtin_clrsb is never inlined Product: gcc Version: 11.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: sven.koehler at gmail dot com Target Milestone: --- With gcc 11.1 on ARM 32-bit and Intel, I don't see that __builtin_clrsb is inlined. On AARCH64 it is inlined and the cls instruction is used, as expected. I use the C-code below to compare the assembly generated. For ARM, I use -O3 -mcpu=cortex-a53 -marm and for Intel I just use -O3. On ARM 32-bit, clrsb1 seems to be the fastest code (see below for the assembly code) since clz handles zero correctly. On Intel, bsr does not handle zero, hence the workaround of setting the lsb before calling __builtin_clzl (see below for the assembly code). On Intel, clrsb1 is slighly longer and uses a jump to handle the zero case. clang apparently uses variant clrsb1 on ARM and Intel, and it's inlined on both architectures when using -O3. #define SHIFT (sizeof(x)*8-1) int clz(unsigned long x) { if (x == 0) { return sizeof(x)*8; } return __builtin_clzl(x); } int clsb(long x) { return clz(x ^ (x >> SHIFT)); } int clrsb1(long x) { return clsb(x)-1; } int clrsb2(long x) { x = ((x << 1) ^ (x >> SHIFT)) | 1; return __builtin_clzl(x); } int clrsb3(long x) { return __builtin_clrsbl(x); } on ARM 32-bit: clrsb1: eor x0, x0, x0, asr 63 clz x0, x0 sub w0, w0, #1 ret on Intel: clrsb2: lea rax, [rdi+rdi] sar rdi, 63 xor rax, rdi or rax, 1 bsr rax, rax xor eax, 63 ret
[Bug middle-end/101973] New: subtraction of clz is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101973 Bug ID: 101973 Summary: subtraction of clz is not optimized Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: sven.koehler at gmail dot com Target Milestone: --- On Intel x86_64, the generated code for __builtin_clz(x) looks something like this: clz(x) = 63 - bsr(x) Since Intel does not seem to have a way to do 63-y in a single instruction, XOR is used instead and the actual assembly code corresponds to clz(x) = 63 ^ bsr(x). Since bsr(x) is in the range 0 to 63, the XOR with 63 is equivalent to 63-y. However, when we actually need the index of the most significant non-zero bit, we have another 63-y, as in this function: int bsr(unsigned long x) { return sizeof(x)*8 - 1 - __builtin_clzl(x); } With -O3, GCC emits the following assembly code: bsr: bsr rdi, rdi mov eax, 63 xor rdi, 63 sub eax, edi ret The XOR with 63 and the subtraction from 63 cancel each other out in this special case. LLVM/clang performs this optimization. One might also consider the arbitrary case of z-clz(x) as a test case. On Intel, this is equivalent to bsr(x)+(z-63).
[Bug middle-end/101973] subtraction of clz is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101973 Sven changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Sven --- OK. Closing this myself.