https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115910

            Bug ID: 115910
           Summary: [15 Regression] ((unsigned)x)/3 with a range for
                    (unsigned)x that does not have the sign bit set seems
                    to produce much worse code
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization, needs-bisection
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64

Take:
```
int f1(int x) {
  if (x < 0) 
    __builtin_abort();
  return ((unsigned)x)/3;
}

int f2(int x) {
  return ((unsigned)x)/3;
}

```

I would assume that f2 and f1 produce almost the same code except f1 would
include a test that for the sign bit.

But in the case of f1, the division is worse:
```
        movsx   rax, edi
        sar     edi, 31
        imul    rax, rax, 1431655766
        shr     rax, 32
        sub     eax, edi
```
vs f2:
```
        mov     eax, edi
        mov     edx, 2863311531
        imul    rax, rdx
        shr     rax, 33
```

In GCC 14, both functions had the same code for the division:
```

        mov     eax, edi
        mov     edx, 2863311531
        imul    rax, rdx
        shr     rax, 33
```

The range for the divisor is the same between GCC 14 and 15 too.
```
  # RANGE [irange] unsigned int [0, 2147483647] MASK 0x7fffffff VALUE 0x0
  x.0_1 = (unsigned intD.9) x_3(D);
```

for some reason aarch64 and arm still produce better code too.

Reply via email to