[Bug c/97833] New: -Wconversion behaves erratic

2020-11-14 Thread sven.koehler at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97833

Bug ID: 97833
   Summary: -Wconversion behaves erratic
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sven.koehler at gmail dot com
  Target Milestone: ---

Created attachment 49559
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49559&action=edit
non-working example

Find attached an example for which -Wconversion behaves uncomprehensible.

Why does it yields warning for test2 but not for test1 and test3?
This happens with gcc for 32bit arm and gcc for x86_64.

In all 3 functions, we have 2 shift operations. The operand is uint16_t, which
is promoted to int. The result of the shift operations is cast to uint16_t. So
the operands of the bit-wise or are again uint16_t. So both operands of the
bitwise or are promoted to int.

So basically, in all 3 cases the code is returning an int. However, -W
conversion warns only in 1 case.

Also, why does it matter whether x is shifted by 0 or 1 ? Why does a shift by 0
result in an error, and a shift by 1 does not?

Why does it matter whether x and y are originally uint8_t being cast to
uint16_t (test2) or a uint16_t (test3) originally? In both cases, the result of
the shifts is cast to uint16_t.


Is gcc trying to keep track of the range of the individual expressions? Is gcc
somehow failing when the a shift by 0 occurs? I believe that a shift by zero is
defined behavior.

[Bug c/101950] New: __builtin_clrsb is never inlined

2021-08-17 Thread sven.koehler at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101950

Bug ID: 101950
   Summary: __builtin_clrsb is never inlined
   Product: gcc
   Version: 11.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sven.koehler at gmail dot com
  Target Milestone: ---

With gcc 11.1 on ARM 32-bit and Intel, I don't see that __builtin_clrsb is
inlined. On AARCH64 it is inlined and the cls instruction is used, as expected.
I use the C-code below to compare the assembly generated. For ARM, I use -O3
-mcpu=cortex-a53 -marm and for Intel I just use -O3.


On ARM 32-bit, clrsb1 seems to be the fastest code (see below for the assembly
code) since clz handles zero correctly. On Intel, bsr does not handle zero,
hence the workaround of setting the lsb before calling __builtin_clzl (see
below for the assembly code). On Intel, clrsb1 is slighly longer and uses a
jump to handle the zero case. clang apparently uses variant clrsb1 on ARM and
Intel, and it's inlined on both architectures when using -O3.





#define SHIFT (sizeof(x)*8-1)

int clz(unsigned long x) {
if (x == 0) {
return sizeof(x)*8;
}
return __builtin_clzl(x);
}

int clsb(long x) {
return clz(x ^ (x >> SHIFT));
}

int clrsb1(long x) {
return clsb(x)-1;
}

int clrsb2(long x) {
x = ((x << 1) ^ (x >> SHIFT)) | 1;
return __builtin_clzl(x);
}

int clrsb3(long x) {
return __builtin_clrsbl(x);
}



on ARM 32-bit:
clrsb1:
eor x0, x0, x0, asr 63
clz x0, x0
sub w0, w0, #1
ret

on Intel:
clrsb2:
lea rax, [rdi+rdi]
sar rdi, 63
xor rax, rdi
or  rax, 1
bsr rax, rax
xor eax, 63
ret

[Bug middle-end/101973] New: subtraction of clz is not optimized

2021-08-18 Thread sven.koehler at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101973

Bug ID: 101973
   Summary: subtraction of clz is not optimized
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sven.koehler at gmail dot com
  Target Milestone: ---

On Intel x86_64, the generated code for __builtin_clz(x) looks something like
this: clz(x) = 63 - bsr(x)

Since Intel does not seem to have a way to do 63-y in a single instruction, XOR
is used instead and the actual assembly code corresponds to clz(x) = 63 ^
bsr(x). Since bsr(x) is in the range 0 to 63, the XOR with 63 is equivalent to
63-y.

However, when we actually need the index of the most significant non-zero bit,
we have another 63-y, as in this function: 

int bsr(unsigned long x) {
return sizeof(x)*8 - 1 - __builtin_clzl(x);
}


With -O3, GCC emits the following assembly code:

bsr:
bsr rdi, rdi
mov eax, 63
xor rdi, 63
sub eax, edi
ret


The XOR with 63 and the subtraction from 63 cancel each other out in this
special case. LLVM/clang performs this optimization.

One might also consider the arbitrary case of z-clz(x) as a test case. On
Intel, this is equivalent to bsr(x)+(z-63).

[Bug middle-end/101973] subtraction of clz is not optimized

2021-08-18 Thread sven.koehler at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101973

Sven  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Sven  ---
OK. Closing this myself.