https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121262

            Bug ID: 121262
           Summary: (x86) GCC sometimes produces 'cmp' instructions of
                    larger register width
           Product: gcc
           Version: 15.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Explorer09 at gmail dot com
  Target Milestone: ---

I'm filing this issue for x86 architectures only, but the issue might also
exist for other architecture targets.

When a value is extracted from a pointed buffer, and stored it to a variable of
a larger bit width, GCC can recognize the upper bits of the new variable is
zero, but not utilize that fact when producing the 'cmp' instructions.

An example can show the issue:

```c
#include <stdint.h>

uint64_t func1_a(uint32_t *p) {
    uint64_t value = *p;
    value &= 0xFFFFFFFF; // Should be no-op
    if (value < 0x12345678) {
        return value;
    }
    return 0x40000000;
}
uint64_t func1_b(uint32_t *p) {
    uint64_t value = *p;
    value &= 0xFFFFFFFF; // Should be no-op
    if ((uint32_t)value < 0x12345678) {
        return value;
    }
    return 0x40000000;
}
```

x86-64 gcc 15.1 with '-Os' option (I tested this in Compiler Explorer)
produces:

```assembly
func1_a:
        movl    (%rdi), %eax
        movl    $1073741824, %edx
        cmpq    $305419896, %rax
        cmovnb  %rdx, %rax
        ret
func1_b:
        movl    (%rdi), %eax
        movl    $1073741824, %edx
        cmpl    $305419896, %eax
        cmovnb  %edx, %eax
        ret
```

Note the "cmp rax, <constant>" is used in `func1_a`. This is unnecessary and
makes the code one byte larger than "cmp eax, <constant>".

This issue does not appear when "value" is not from a pointed buffer. (You can
compare the code with `func1_noptr` in the additional test code below.)

Additional test code:

```c
uint32_t func2_a(uint16_t *p) {
    uint32_t value = *p;
    value &= 0xFFFF; // Should be no-op
    if (value < 0x1234) {
        return value;
    }
    return 0x4000;
}
uint32_t func2_b(uint16_t *p) {
    uint32_t value = *p;
    value &= 0xFFFF; // Should be no-op
    if ((uint16_t)value < 0x1234) {
        return value;
    }
    return 0x4000;
}
uint32_t func3_a(uint8_t *p) {
    uint32_t value = *p;
    value &= 0xFF; // Should be no-op
    if (value <= 0x7F) {
        return value;
    }
    return (uint32_t)-1;
}
uint32_t func3_b(uint8_t *p) {
    uint32_t value = *p;
    value &= 0xFF; // Should be no-op
    if ((uint8_t)value <= 0x7F) {
        return value;
    }
    return (uint32_t)-1;
}

// `func1_noptr`, `func2_noptr` and `func3_noptr` have no issues
uint64_t func1_noptr(uint32_t x) {
    uint64_t value = x;
    if (value < 0x12345678) {
        return value;
    }
    return 0x40000000;
}
uint32_t func2_noptr(uint16_t x) {
    uint32_t value = x;
    if (value < 0x1234) {
        return value;
    }
    return 0x4000;
}
uint32_t func3_noptr(uint8_t x) {
    uint32_t value = x;
    if (value <= 0x7F) {
        return value;
    }
    return (uint32_t)-1;
}
```

The 8-bit version of the test code, `func3`, is the actual problem I'm facing.
I want to check whether a byte in an array is in the [0x00, 0x7F] range (which
is something about checking whether the string is ASCII). And GCC missed the
opportunity to produce a smaller code there.

`func3_a` and `func3_b` should be equivalent.

Reply via email to