Re: [PATCH v2] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

Palmer Dabbelt Tue, 24 May 2022 18:47:09 -0700

On Tue, 24 May 2022 18:36:27 PDT (-0700), Vineet Gupta wrote:



On 5/24/22 18:32, Palmer Dabbelt wrote:


Ping, IMO this needs to be (re)considered for trunk.
This goes really nicely with riscv_slow_unaligned_access_p==false, to
elide the unrolled tail copies for trailer word/sword/byte accesses.

@Kito, @Palmer ? Just from codegen pov this seems to be a no brainer


Has anything changed since this was posted?

IIRC the discussion essentially boiled down to that overlapping store
likely being a hard case on in-order machines (like the C906), but
there weren't any benchmarks or documentation so we could figure that
out.  I don't see how this is an obvious win: sure it's fewer ops (and
assuming a uniform distribution fewer misaligned accesses, though I
don't know how reasonable uniform distributions are here), but it's
only a small upside so that hard case would have to be fast in order
for this to be better code.

If someone has benchmarks showing these are actually faster on the
C906 (or even some documentation describing how these accesses are
handled) then I'm happy to take the code (with the -Os bit fixed).  It
shouldn't be all that hard of a benchmark to run...


Will this be acceptable, if this was a per cpu knob then ? There seem to
be existing OoO RV cores too !

It's being added as a per-cpu knob, it's just only being turned on forthe C906 and -Os tunings where it's not obviously a win.

I'm certainly not saying nobody builds this flavor of machine, certainlyIntel does as it's on for their machines, just that there's no solidevidence the C906 behaves this way. Given that this flag had beenexplicitly discussed not to include generating misaligned accesses onpurpose during the Os discussions, I don't want to just flip it over ona vendor and risk a performance regression.

The only other pipeline models are for in-order SiFive processors thattrap into M-mode for unaligned accesses, so this sort of thing doesn'tapply (though it's part of the reason -Os doesn't do this, as they'restill pretty common).

foo:
     sd    zero,0(a0)
     sw    zero,8(a0)
     sh    zero,12(a0)
     sb    zero,14(a0)

vs.

     sd    zero,0(a0)
     sd    zero,7(a0)

Re: [PATCH v2] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

Reply via email to