On Thu, 14 Aug 2025 13:36:53 PDT (-0700), jeffreya...@gmail.com wrote:


On 8/14/25 8:23 AM, Vineet Gupta wrote:
__builtin_round() fails to save/restore FP exception flags around the FP
compare insn which can potentially clobber the same.

Worth noting that the fflags restore bracketing is slightly different
than the glibc implementation. gcc implementation w/ this patch
generates following, where fsflags is called even if fcvt* were not
executed because the prior flt can also clobber the flags. glibc
implementation due to early NaN check (before the flt.s) only needs
inside of the branch.

| convert_float_to_float_round
| ...
|   frflags     a5
|   fabs.s      fa5,fa0
|   flt.s       a4,fa5,fa4    <--- can clobber fflags
|   beq a4,zero,.L3
|     fcvt.w.s a4,fa0,rmm     <--- also
|     fcvt.s.w  fa5,a4
|     fsgnj.s   fa0,fa5,fa0
| .L3:
|    fsflags    a5            <-- both code paths

        PR target/121534

gcc/ChangeLog:

        * config/riscv/riscv.md (round_pattern): save/restore fflags.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/round_64.c: Scan for frflags and fsflags.
So I'm guessing the issue at hand is in that rounding path that does not
use Zfa we've introduced a flt instruction where we didn't have one
before (previously we likey called into glibc).  And so now we have to
manage the accrued exception state?  Essentially saving/restoring state
around the round() case?

IIUC the difference is that we used to call into glibc for all of these, and glibc has the flag save/restore. It also has the NaN-handling difference that Vineet pointed out, but I remember when this all came up we went through the special case handling for these and convinced ourselves it was fine.

That said, don't trust me on FP stuff ;)

My recollection is that sequence was dramatically better than what we
had before, so even with a bit mucking around with FP CSRs it's likely
still profitable.

At a bare minimum it's inlining the call to libm, which is probably profitable just due to the fallout from a call. So seems sane to have at -O2 (no idea what happens in -Os). I think we can just take the fix without deciding on that difference, though ;)

It failed CI, presumably a scan test:

https://github.com/ewlu/gcc-precommit-ci/issues/3774#issuecomment-3188777455

Assuming the scan test just needs the count updated, this is OK for the
trunk.

Acked-by: Palmer Dabbelt <pal...@dabbelt.com>


Thanks,
Jeff

Reply via email to