[Bug tree-optimization/115529] Optimization with "bit mask and compare equality" ((x & C1) == C2), ((x | C3) == C4)

Explorer09 at gmail dot com via Gcc-bugs Tue, 04 Mar 2025 01:04:37 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115529


--- Comment #2 from Kang-Che Sung <Explorer09 at gmail dot com> ---
Since I've run into this problem again. I think it's good rephrase the issue,
and provide another example code.

This optimization feature request is about comparing whether a variable is in a
specific range, where:
(a) the variable is unsigned,
(b) the range values to compare are compile time constants, and
(c) the range values are "aligned" somehow to power-of-two boundaries 

There would be various ways to perform such comparison, and the compiler should
recognize them as equivalent, and, if possible, pick one of the forms that
results in best code (in terms of performance, code size or both).

```c
unsigned int pred2(unsigned int x) {
    // The naive, intuitive range comparison
    return x >= 0xD800 && x <= 0xDFFF;
}
unsigned int pred2_sub(unsigned int x) {
    return (x - 0xD800) <= (0xDFFF - 0xD800);
}
unsigned int pred2_bitand(unsigned int x) {
    return (x &= ~0x7FF) == 0xD800;
}
unsigned int pred2_bitor(unsigned int x) {
    return (x |= 0x7FF) == 0xDFFF;
}
unsigned int pred2_rshift(unsigned int x) {
    return (x >>= 11) == (0xD800 >> 11);
}
unsigned int pred2_div(unsigned int x) {
    return (x / 0x800) == (0xD800 / 0x800);
}
```

All of these comparisons (subtraction, division, bitwise AND, bitwise OR, and
right shift) are equivalent.

GCC currently (as of version 14.2) can recognize pred2_sub() and pred2_div()
and translate the latter to the former, but it seems to miss pred2_bitand(),
pred2_bitor() and pred2_rshift().

(On the other hand, Clang 19.1.0 can recognize all five and translate all of
them to pred2_bitand() code. Even with "-Os" option, which is not the best for
code size.)

When tuning my code for small code size, I often struggle with which form of
the five to use, in order to make the smallest size possible. With this
particular example (about Unicode surrogate code points), I found out
pred2_rshift_b() approach makes the smallest code, and thus I expect "-Os"
option translates to that code.

```c
unsigned int pred2_sub_b(unsigned int x) {
    x -= 0xD800; __asm__ ("" : "+r" (x));
    return x <= (0xDFFF - 0xD800);
} // x86_64: 18 bytes; riscv32: 20 bytes;

unsigned int pred2_bitand_b(unsigned int x) {
    x &= ~0x7FF; __asm__ ("" : "+r" (x));
    return x == 0xD800;
} // x86_64: 18 bytes; riscv32: 18 bytes;

unsigned int pred2_bitor_b(unsigned int x) {
    x |= 0x7FF; __asm__ ("" : "+r" (x));
    return x == 0xDFFF;
} // x86_64: 18 bytes; riscv32: 16 bytes;

unsigned int pred2_rshift_b(unsigned int x) {
    x >>= 11; __asm__ ("" : "+r" (x));
    return x == (0xD800 >> 11);
} // x86_64: 12 bytes; riscv32: 10 bytes;

unsigned int pred2_div_b(unsigned int x) {
    // GCC 14.2 translates this to pred2_rshift_b() if the inline asm is used,
    // but pred2_sub() if the inline asm is removed.

    x /= 0x800; __asm__ ("" : "+r" (x));
    return x == (0xD800 / 0x800);
}
```

[Bug tree-optimization/115529] Optimization with "bit mask and compare equality" ((x & C1) == C2), ((x | C3) == C4)

Reply via email to