https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121591

--- Comment #3 from Kang-Che Sung <Explorer09 at gmail dot com> ---
(In reply to ak from comment #2)
> Many x86 targets have limits on how many branches their branch predictor can
> track per 16 byte line so what you are asking for is likely slower. On
> others there are also similar limits what the decoded icache can cache per
> 32 bytes.

Is that really an issue not to optimize this?

My testing shows that it is when isless(a, b) and isgreater(a, b) are used
together, the UCOMISD instructions became not merged.

Even when the branch instructions were less than 16 bytes than their float
compare instructions. That makes, for example, this simple float compare
function more code than necessary:

```c
// It is expected that this can be used as a compare function in qsort()
int float_compare2(const double *a, const double *b) {
    if (*a > *b)
        return 1;
    if (*a < *b)
        return -1;
    return 0;
}
```

I tested this even with the '-Oz' option, which, you know, should ignore
anything about performance.

Reply via email to