Re: [PATCH v2] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

Vineet Gupta Tue, 24 May 2022 18:36:38 -0700



On 5/24/22 18:32, Palmer Dabbelt wrote:

Ping, IMO this needs to be (re)considered for trunk.
This goes really nicely with riscv_slow_unaligned_access_p==false, to
elide the unrolled tail copies for trailer word/sword/byte accesses.

@Kito, @Palmer ? Just from codegen pov this seems to be a no brainer
Has anything changed since this was posted?
IIRC the discussion essentially boiled down to that overlapping storelikely being a hard case on in-order machines (like the C906), butthere weren't any benchmarks or documentation so we could figure thatout. I don't see how this is an obvious win: sure it's fewer ops (andassuming a uniform distribution fewer misaligned accesses, though Idon't know how reasonable uniform distributions are here), but it'sonly a small upside so that hard case would have to be fast in orderfor this to be better code.
If someone has benchmarks showing these are actually faster on theC906 (or even some documentation describing how these accesses arehandled) then I'm happy to take the code (with the -Os bit fixed). Itshouldn't be all that hard of a benchmark to run...

Will this be acceptable, if this was a per cpu knob then ? There seem tobe existing OoO RV cores too !

foo:
     sd    zero,0(a0)
     sw    zero,8(a0)
     sh    zero,12(a0)
     sb    zero,14(a0)

vs.

     sd    zero,0(a0)
     sd    zero,7(a0)

Re: [PATCH v2] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

Reply via email to