Issue 150120
Summary Missed optimization: Lowering struct materialization into cold branches
Labels new issue
Assignees
Reporter jeremy-rifkin
    Clang optimizes sub-optimally for the following code:

```cpp
struct S {
    int& x;
    int& y;
    bool check() {
 return x < y;
    }
};

[[noreturn]] [[gnu::cold]] void bar(const S& s);

void foo(int a, int b) {
    S s{a, b};
    if(s.check()) [[unlikely]] { // very unlikely and cold
        bar(s);
 }
}
```

```asm
foo(int, int):
        sub     rsp, 24
        mov dword ptr [rsp + 4], edi
        mov     dword ptr [rsp], esi
        lea rax, [rsp + 4]
        mov     qword ptr [rsp + 8], rax
        mov rax, rsp
        mov     qword ptr [rsp + 16], rax
        cmp     edi, esi
        jl      .LBB0_2
        add     rsp, 24
 ret
.LBB0_2:
        lea     rdi, [rsp + 8]
        call    bar(S const&)@PLT
```

Struct `S` must be on the stack in order to call `bar()`, however, that's only needed in the unlikely case that the condition fails. Ideally the codegen should be the following:

```asm
foo(int, int):
 cmp     edi, esi
        jl      .LBB0_2
        ret
.LBB0_2:
 sub     rsp, 24
        ... copy edi/esi to the stack and make struct S ...
        call    bar(S const&)@PLT
```

MSVC generates something along these lines, gcc and clang do not: https://godbolt.org/z/4axKfoe8x

I can't simply write `if(a < b)` or delay the construction of `S` until inside the branch. My specific use case that results in code like this is [libassert](https://github.com/jeremy-rifkin/libassert), where an _expression_ template is built from the user's condition and that is evaluated and inspected during assertion failure.

Even if the code is written as follows, clang still generates sub-ideal code:

```cpp
void foo(int a, int b) {
    if(a < b) [[unlikely]] { // very unlikely and cold
        S s{a, b};
        bar(s);
    }
}
```

```asm
foo(int, int):
        sub rsp, 24
        mov     dword ptr [rsp + 4], edi
        mov     dword ptr [rsp], esi
        cmp     edi, esi
        jl      .LBB0_2
        add rsp, 24
        ret
.LBB0_2:
        lea     rax, [rsp + 4]
        mov qword ptr [rsp + 8], rax
        mov     rax, rsp
        mov     qword ptr [rsp + 16], rax
        lea     rdi, [rsp + 8]
        call    bar(S const&)@PLT
```

This may be a tricky optimization to perform, however, due to the above I expect it would benefit a large amount of code.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to