82106]

Jeff Law Wed, 09 Jul 2025 14:52:07 -0700



On 7/3/25 3:50 PM, Palmer Dabbelt wrote:

This is really Jim's code, but it's been sitting around in Bugzilla for a while
so I've picked it up.  All I really did here is add a target hook and mangle
some comments, but I think I understand enough about what's going on to try and
get things moving forward.  So I'm writing up a pretty big cover letter to try
and summarize what I think is going on here, as it's definitely not something I
fully understand yet.

We've got a quirk in the RISC-V ABI where DF arguments on rv32 get split into
an X register and a 32-bit aligned stack slot.  The middle-end prologue code
just stores out the X register and treats the argument as if it was entirely
passed on the stack.  This can result in a misaligned load, and those are still
slow on a bunch of RISC-V systems.

Weird. I'd been constructing a response over the last week or so alittle bit a a time. But I can't find it or any evidence of it.

Anyway, this isn't that weird of a quirk, other targets have partial inregs arguments. The PA SOM ABI had this property IIRC, but afterplaying with it a bit it's not useful to review for insights as it's astrict alignment target.

I also checked another old favorite, the V8, but it doesn't have DF modemoves, so not particularly intesting for that case.


Without this patch (and with this patch on targets with fast misaligned
accesses) that generates

         sw      a7,12(sp)
         fld     fa0,12(sp)

and with this patch (on a subtarget with slow misaligned access) ends up as

         lw      a5,16(sp)
         sw      a7,8(sp)
         sw      a5,12(sp)
         fld     fa0,8(sp)

That looks a little odd, but I think it's actually good code -- the only way to
get a double into a register on rv32 is to load it from memory, so without
misaligned loads we're sort of just stuck there.

Right. Both sequences likely trigger store-forward-bypass issues. Thesecond is probably better when we don't have fast unaligned access.


While playing around writing this cover letter I came up with another case
that's essentially

     long long foo(..., long long split) { return split; }

that used to generate

         sw      a7,12(sp)
         lw      a0,12(sp)
         lw      a1,16(sp)

and now generates

         lw      a1,0(sp)
         mv      a0,a7

Yea, but I think this would would "just work" if we killed movdi forrv32. That's a hole I'm not going to dive into (can't justify theamount of time it would likely take to chase down regressions). But's ahole with a bit of cheese in it for someone who really cares about rv32.

Essentially for the long long case the existence of movdi means thecomponent moves aren't seen and optimized away by CSE.

The target hook will need some adjustment, but ultimately I'm not even sure if
a target hook is the way to go here.  It was just an easy way to flip the
behavior so I could play around with some of Jim's code.  It kind of feels like
the load/subword merge version would result in better code in general, but I'm
not sure on that one.

That said, I figured I'd just send it out so others could see this.  It's very
much out of my wheel house, so I'd be shocked if this doesn't cause any
failures...

Not sure if it's your patch or some other instability, but pre-committesting certainly wasn't happy. Failed to build glibc.

Conceptually I don't see any other viable approach for DF and slowunaligned access. There's no direct way to get transfer the value GPRsinto the FPRs. You have to go through memory, which is a time honoredtradition. Folks have spent a ton of time on these problems in the past.

Given the value has to be in memory at some point and it can't be in theoriginal slot because it's unaligned, the only viable path is toallocate a new slot and initialize it efficiently. That can eitherhappen during initial RTL generation like Jim & you have done or wecould make it possible to hold DF values in GPRs, but split post-reloadand hope DSE cleans things up.

Anyway, ISTM that attacking at RTL generation time is the better choice.So you'd need to determine if your patch is causing the glibc failureand if so, address that.


Jeff

Re: [PATCH v1 0/2] Allow targets to avoid materializing split parameters via stack extension [PR/82106]

Reply via email to