On 3/19/24 10:23 AM, Palmer Dabbelt wrote:
On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreya...@gmail.com wrote:


On 3/18/24 3:09 AM, Jivan Hakobyan wrote:
As RV has round instructions it is reasonable to use them instead of
calling the library functions.

With my patch for the following C code:
double foo(double a) {
     return ceil(a);
}

GCC generates the following ASM code (before it was tail call)
foo:
         fabs.d  fa4,fa0
         lui     a5,%hi(.LC0)
         fld     fa3,%lo(.LC0)(a5)
         flt.d   a5,fa4,fa3
         beq     a5,zero,.L3
         fcvt.l.d a5,fa0,rup

I'm not sure exactly what context this is in, but my reading of "according to the current rounding mode" means we'd usually use the dynamic rounding mode.
As Andrew W. noted, we're dealing with ceil and thus rup is the correct rounding mode to use here.




My only worry here is that when we were doing the other patterns we decided not to do rint.  I can't remember exactly why, but from reading the docs we probably just skipped it due to the inexact handling and Zfa having an instruction that just does this.  FP stuff is always a bit of a time sink, so at that point it probably just fell off the priority list.
rint is supposed to raise FE_INEXACT, so it's actually a good match for RISC-V fcvt semantics as they appropriately raise FE_INEXACT.

nearby* do not arise FE_INEXACT and thus would rely on the new Zfa instructions where we have ones that do not raise FE_INEXACT or they need to be conditional on flag_fp_int_builtin_inexact. One could reasonably argue that when flag_fp_int_builtin_inexact is enabled that a call to nearby* ought to be converted into a call to rint*.



I'm not really an FP guy, so I usually just poke around what the other ports generate and try to figure out what's going on.  arm64 has the Zfa instruction and x86 FP is complicated, so I'm not sure exactly who else to look at for this sort of stuff.  From just looking at the code, though, I think there's two issues -- I'm not really an FP person, though, so take this with a grain of salt:
Right. And the condition under which we use the new sequence for ceil/round actually borrows from x86. Essentially we only use the new sequence when we've been told we don't care about FE_INEXACT or fp exceptions in general.


IIUC what we've got here doesn't actually set the inexact flag for the bounds check failure case, as we're just loading up an exact constant and doing the bounds check.  We're also not clamping to INT_MAX-type values, but not sure if we're supposed to.  I think we could fix both of those by adjusting the expansion to something like
The state of FE_INEXACT is a don't care here due the condition on the expansion code.



              fabs.d  fa4,fa0
              lui     a5,%hi(.LC0)
              fld     fa3,%lo(.LC0)(a5)
              flt.d   a5,fa4,fa3
              bne     a5,zero,.L3
              mv      fa0, fa3
     .L3:
              fcvt.l.d a5,fa0,rup
              fcvt.d.l        fa4,a5
              fsgnj.d fa0,fa4,fa0
              ret

and then adjusting the constant to be an epsilon larger than INT_MAX so it'll still trigger the clamping but also inexact.
I think Jivan's sequence is more correct. It's not just INT_MAX here that's concerning, there's a whole class of values that cause problems.


There's also a pair of changes to the ISA in 2020 that added the conversion inexact handling requirement, it was a grey area before.  I don't remember exactly what happened there, but I did remember it happening.  I don't think anyone cares all that much about the performance of systems that target the older ISAs, so maybe we just restrict the non-libcall expansion to ISAs that contain the new wording?
I think all this got sufficiently cleaned up. The spec is explicit about when FE_INEXACT gets raised on the fcvt instructions. I referred to it repeatedly when analyzing Jivan's work.

We can hash through the final items in a few weeks once the trunk re-opens for development.

Jeff

Reply via email to