[Bug rtl-optimization/92712] New: Performance regression with assumed values

2019-11-28 Thread mike.k at digitalcarbide dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92712

Bug ID: 92712
   Summary: Performance regression with assumed values
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mike.k at digitalcarbide dot com
  Target Milestone: ---

The following code generates progressively worse code from GCC 7.5 to GCC 8.3
to GCC 9.1 (and trunk):

static void func_base(int t, const int v) {
int x = 0;
for (int i = 0; i < t; ++i) {
x += v;
}
volatile int d = x;
}

void func_default(int t, const int v) {
func_base(t, v);
}

void func_assumed(int t, const int v) {
if (t < 0) __builtin_unreachable();
func_base(t, v);
}

On GCC 7.5 (-O2):

func_default(int, int):
  test edi, edi
  jle .L3
  imul edi, esi
  mov DWORD PTR [rsp-4], edi
  ret
.L3:
  xor edi, edi
  mov DWORD PTR [rsp-4], edi
  ret
func_assumed(int, int):
  imul edi, esi
  mov DWORD PTR [rsp-4], edi
  ret

On GCC 8.3 (-O2):

func_default(int, int):
  test edi, edi
  jle .L3
  imul edi, esi
  mov DWORD PTR [rsp-4], edi
  ret
.L3:
  xor edi, edi
  mov DWORD PTR [rsp-4], edi
  ret
func_assumed(int, int):
  test edi, edi
  je .L6
  imul edi, esi
.L6:
  mov DWORD PTR [rsp-4], edi
  ret

On GCC 9.1 and trunk (-O2):

func_default(int, int):
  test edi, edi
  jle .L3
  sub edi, 1
  imul edi, esi
  add esi, edi
  mov DWORD PTR [rsp-4], esi
  ret
.L3:
  xor esi, esi
  mov DWORD PTR [rsp-4], esi
  ret
func_assumed(int, int):
  test edi, edi
  je .L6
  sub edi, 1
  imul edi, esi
  add edi, esi
.L6:
  mov DWORD PTR [rsp-4], edi
  ret

This occurs regardless of if `func_base` is allowed to inline, or if it is
manually inlined.

It does not occur in LLVM-Clang or in Microsoft Visual C++.

[Bug rtl-optimization/93605] New: GCC suboptimal tail call optimization in trivial function forwarding with __attribute__((noinline))

2020-02-05 Thread mike.k at digitalcarbide dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93605

Bug ID: 93605
   Summary: GCC suboptimal tail call optimization in trivial
function forwarding with __attribute__((noinline))
   Product: gcc
   Version: 9.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mike.k at digitalcarbide dot com
  Target Milestone: ---

In a trivial function-forwarder where `__attribute__((noinline))` is specified
on the forwardee, an extra `movzx` instruction is generated (on x86-64) prior
to the tail call. This does not occur on Clang.

Observe (https://godbolt.org/z/kFGCpW):

```
namespace impl {
__attribute__((noinline))
static int func (bool v, int a, int b) {
return v ? a/b : b/a;
}
}

int func(bool v, int a, int b) {
return impl::func(v, a, b);
}
```

On all tested versions (trunk (10) to GCC 4), this produces the following
assembly for `func`:

```
func(bool, int, int):
  movzx edi, dil
  jmp impl::func(bool, int, int)
```

On Clang trunk (10) until Clang 5.0.0, this produces the following assembly for
`func`:

```
func(bool, int, int): # @func(bool, int, int)
  jmp impl::func(bool, int, int) # TAILCALL
```

Clang 5.0.0 and below produce identical assembly to GCC:

```
func(bool, int, int): # @func(bool, int, int)
  movzx edi, dil
  jmp impl::func(bool, int, int) # TAILCALL
```

[Bug target/93605] GCC suboptimal tail call optimization in trivial function forwarding with __attribute__((noinline))

2020-02-05 Thread mike.k at digitalcarbide dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93605

--- Comment #2 from mike.k at digitalcarbide dot com ---
Interestingly, changing `impl::func`'s signature from `bool v` to `auto&& v`
fixes the issue. Changing it to `auto v` does not.

[Bug middle-end/91459] New: Tail-Call Optimization is not performed when return value is assumed.

2019-08-15 Thread mike.k at digitalcarbide dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91459

Bug ID: 91459
   Summary: Tail-Call Optimization is not performed when return
value is assumed.
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mike.k at digitalcarbide dot com
  Target Milestone: ---

In situations where a function either returns a specific value or does not
return at all, GCC fails to perform tail call optimizations. This appears to
occur on all GCC versions with -O1, -O2, -O3, and -Os. It occurs with both the
C and C++ front-ends.

Observe:

/* This function is guaranteed to only return the value '1', else it does not
return.
// This is meant to emulate a function such as 'exec'.
*/
extern int function_returns_only_1_or_doesnt_return(int, int);

int foo1(int a, int b) {
const int result = function_returns_only_1_or_doesnt_return(a, b);
if (result == 1) {
return result;
}
else {
__builtin_unreachable();
}
}

int foo2(int a, int b) {
return function_returns_only_1_or_doesnt_return(a, b);
}


This results in the following output for -O3 on x86-64:

foo1(int, int):
  push rax
  call function_returns_only_1_or_doesnt_return(int, int)
  mov eax, 1
  pop rdx
  ret
foo3(int, int):
  jmp function_returns_only_1_or_doesnt_return(int, int)

While the behavior is correct, the tail-call optimization is far more optimal
and preserves the same semantics.

The same behavior occurs with other architectures as well, so it does not
appear to be a back-end issue.

[Bug middle-end/91459] Tail-Call Optimization is not performed when return value is assumed.

2019-08-15 Thread mike.k at digitalcarbide dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91459

--- Comment #1 from mike.k at digitalcarbide dot com ---
'foo3' in the assembly output should be 'foo2'. I'd changed the function name
in my test code and did not update the assembly. Apologies.

[Bug c++/82658] New: Suboptimal codegen on AVR when right-shifting 8-bit unsigned integers.

2017-10-22 Thread mike.k at digitalcarbide dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82658

Bug ID: 82658
   Summary: Suboptimal codegen on AVR when right-shifting 8-bit
unsigned integers.
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mike.k at digitalcarbide dot com
  Target Milestone: ---

This issue has been validated to occur back as far as at least 5.4.0, and still
occurs in trunk.

When shifting an unsigned char/uint8_t right by less than 4 bits, suboptimal
code is generated. This behavior only occurs when compiling source files as
C++, not as C, even when the source file is equivalent otherwise. The issue
does not manifest with left shifts or with larger composite types (such as
uint16_t).

Trivial test:

void test ()
{
volatile unsigned char val;
unsigned char local = val;
local >>= 1;
val = local;
}

Compiling as C++ (avr-g++ [-O3|-O2] -mmcu=atmega2560 test.cpp -S -c -o test.s)
results in the following assembly sequence handling the load, shift, and store:

ldd r24,Y+1
ldi r25,0
asr r25
ror r24
std Y+1,r24

The next operation performed on r25 is a clr. Thus, ldi/asr/ror are entirely
equivalent to lsr in this situation, which is what the C frontend does:

Compiling as C (avr-gcc [-O3|-O2] -mmcu=atmega2560 test.c -S -c -o test.s)
results in the following assembly sequence handling the load, shift, and store:

ldd r24,Y+1
lsr r24
std Y+1,r24

This is optimal code. This is also the defined behavior in avr.c.

The issue becomes more problematic with larger shifts (up until 4, where the
defined behavior takes over again), as it generates the same instruction
sequence repeatedly, whereas gcc simply generates 'lsr; lsr; lsr', as expected.

Interestingly, the issue does _not_ manifest if one chooses to use an integer
division instead of a shift - if one divides the unsigned char by 2 instead of
shifting right 1, it emits 'lsr' as expected.

[Bug middle-end/82658] Suboptimal codegen on AVR when right-shifting 8-bit unsigned integers.

2017-10-31 Thread mike.k at digitalcarbide dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82658

--- Comment #2 from mike.k at digitalcarbide dot com ---
I wanted to validate if this issue was presenting in the toolchains for other
architectures, so I tested a bit:

GCC 7.2.0 on x86-64 (-O3):

C:

movzx   eax, BYTE PTR [rsp-1]
shr al
mov BYTE PTR [rsp-1], al
ret

C++:

movzx   eax, BYTE PTR [rsp-1]
sar eax
mov BYTE PTR [rsp-1], al
ret

While not different in performance, it _is_ generating different code, and the
code difference seems to reflect what Richard already found.

I am not able to reproduce any difference on MIPS64, MIPS32, ARM, ARM64, PPC,
PPC64. This is probably due to backend differences not causing the sequences to
map differently.

I do see it going back to GCC 4.6.4 on AVR.