https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121208

            Bug ID: 121208
           Summary: Wrong user-level interrupt vector value with TLS
                    variable when build with optimisation
           Product: gcc
           Version: 15.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: charles.goedefroit at eviden dot com
  Target Milestone: ---
            Target: x86_64-linux-gnu
             Build: /usr/src/gcc/configure --build=x86_64-linux-gnu
                    --disable-multilib --enable-languages=c,c++,fortran,go

The issue affecte GCC version 11 12 13 14 and 15.

## Some context

User-level interrupts (UINTR) is a new hardware feature introduced by Intel
since Sapphire Rapids processors.

That allows to register an interrupt handler in user-space to bypass the system
(OS bypass).

A thread that will want to receive interrupts needs to:
- register an interrupt handler (`ui_handler` in our code) with the syscall
`uintr_register_handler(ui_handler, flags)` (`syscall(471, ui_handler, 0)`).
- ask a file descriptor associate to a user-level interrupt vector (UVEC) with
the syscall `uvec_fd = uintr_vector_fd(UVEC, 0)` (`uvec_fd = syscall(473, 6,
0)`).
- unmask user-level interrupt with STUI instruction (`_stui()`)
- Share the `uvec_fd` file descriptor with all sender thread.

A thread that wants to send user-level interrupt must be registered as sender
with `uipi_index = uintr_register_sender(uvec_fd, flags)` (`uipi_index =
syscall(474, uvec_fd, 0)`) and can use the SENDUIPI instruction
(`_senduipi(uipi_index)`) to trigger an interrupt.

A user-level interrupt handler (`ui_handler`) is called with parameters on the
stack.
The last parameter is the user-level interrupt vector (uvec).

We use thread-local storage (TLS) variable.

We create a shared library to manage UINTR and we create a small reproducer in
`intrHandler.c`.

Build command:

```bash
# build libintrHandler_opt.so
gcc -Wall -Wextra -DNDEBUG -muintr -g -O3 -fPIC -c -save-temps -o
intrHandler_opt.pic.o intrHandler.c
gcc intrHandler_opt.pic.o -shared -o libintrHandler_opt.so
# build ./uintr2Threads_opt
gcc -L. -Wl,-rpath=. -Wall -Wextra -DNDEBUG -muintr -g -o uintr2Threads_opt
uintr2Threads.c -lintrHandler_opt
```

## Our issue.

The `uvec` parameter of `ui_handler` interrupt handler is loaded in a
caller-save register (`%rcx`), then is not saved before calling the
`__tls_get_addr` function which causes an invalid value check.

In `ui_handler` we want to set to 1 a global TLS variable to know when we are
in interrupt context or not, and we set with TLS variable
(`th_in_interrupt_handler`) to 0 before the `ui_handler` returns.

Juste after set the `th_in_interrupt_handler` TLS variable to 1,
we check the `uvec` value to distinguish between different vector to perform
different actions.
In our example we only check on `uvec` value (6) because it's enough to
reproduce the bug.

When we build in non optimized (`-O0`), everything works.
We send the UVEC 6 then the `if` statement in the `ui_handler` branch on 6.

When we build in optimized (`-O1`, `-O2` or `-O3`), the `uvec` value got an
invalid value.
We send the UVEC 6 then the `if` statement in the `ui_handler` branch on `else`
with an invalide value.

So we check the assembly code with `objdump -dS --disassemble=ui_handler
libintrHandler_opt.so`.

```txt
    117f:       48 83 ec 08             sub    $0x8,%rsp
    1183:       48 8b 4c 24 60          mov    0x60(%rsp),%rcx
        th_in_interrupt_handler = 1;
    1188:       fc                      cld
    1189:       66 48 8d 3d 1f 2e 00    data16 lea 0x2e1f(%rip),%rdi        #
3fb0 <th_in_interrupt_handler@Base>
    1190:       00
    1191:       66 66 48 e8 b7 fe ff    data16 data16 rex.W call 1050
<__tls_get_addr@plt>
    1198:       ff
    1199:       c6 00 01                movb   $0x1,(%rax)
        if(uvec == UVEC) {
    119c:       48 83 f9 06             cmp    $0x6,%rcx
    11a0:       75 2d                   jne    11cf <ui_handler+0x5f>
```

In the assembly, we see that the `uvec` are loaded in the `%rcx` register, then
the libc `__tls_get_addr@plt` is called, finally the `if` check is done (`cmp  
 $0x6,%rcx`).
So we can see that the `RCX` register isn't saved before the libc
`__tls_get_addr@plt` call.
`RCX` is a caller-save register and must be saved before any function call.
When during the `__tls_get_addr@plt` call the `%rcx` register changes and is
not restored.

To generate a valid code, I add the line at the at the beginning of the
`ui_handler`.

```c
__attribute__((target("general-regs-only")))
__attribute__((interrupt))
void ui_handler(__attribute__((unused)) struct __uintr_frame*ui_frame, uint64_t
uvec) {
    asm volatile ("nop" : : : "%rcx");
        th_in_interrupt_handler = 1;
        if(uvec == UVEC) {
                (*callback)();
        } else {
                exit(uvec);
                }
        th_in_interrupt_handler = 0;
}
```

So, `%r8` is used and the `uvec` value become valide when build in optimized.

```txt
    117f:       48 83 ec 08             sub    $0x8,%rsp
    1183:       4c 8b 44 24 60          mov    0x60(%rsp),%r8
        asm volatile ("nop" : : : "%rcx");
    1188:       90                      nop
        th_in_interrupt_handler = 1;
    1189:       fc                      cld
    118a:       66 48 8d 3d 1e 2e 00    data16 lea 0x2e1e(%rip),%rdi        #
3fb0 <th_in_interrupt_handler@Base>
    1191:       00
    1192:       66 66 48 e8 b6 fe ff    data16 data16 rex.W call 1050
<__tls_get_addr@plt>
    1199:       ff
    119a:       c6 00 01                movb   $0x1,(%rax)
        if(uvec == UVEC) {
    119d:       49 83 f8 06             cmp    $0x6,%r8
    11a1:       75 2d                   jne    11d0 <ui_handler+0x60>
```

`gcc -v` output:

```txt
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-linux-gnu/15.1.0/lto-wrapper
Target: x86_64-linux-gnu
Configured with: /usr/src/gcc/configure --build=x86_64-linux-gnu
--disable-multilib --enable-languages=c,c++,fortran,go
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 15.1.0 (GCC)
```

Reply via email to