Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

Vineet Gupta Tue, 02 Nov 2021 15:04:35 -0700

On 11/2/21 2:18 PM, Christoph Müllner wrote:

On Tue, Nov 2, 2021 at 9:15 PM Vineet Gupta <[email protected]> wrote:



On 11/2/21 1:09 PM, Christoph Müllner wrote:

Without overlap_op_by_pieces we get:
     8e:   00053023                sd      zero,0(a0)
     92:   00052423                sw      zero,8(a0)
     96:   00051623                sh      zero,12(a0)
     9a:   00050723                sb      zero,14(a0)

To generate even the non optimized code above with gcc 11 [1][2], what
do I need to do. Despite -mno-strict-align and trying -mtune={rocket,
sifive-7-series}, I only get the fully unrolled version

You need a tuning struct with slow_unaligned_access == false.
Both, Rocket and Sifive 7, have slow unaligned access set to true.
Mainline you have thead-c906 which would work.

But doesn't -mno-strict-align imply that ?

Opposite direction.


Took me a while to unpack :-)

With `-mno-strict-align` emitted code might contain unaligned accesses
if `slow_unaligned_access == false`.
If `slow_unaligned_access == false`, then `-mstrict-align` will
prevent unaligned accesses.
Usually, there is a good reason why `slow_unaliged_access` is set to
`true` (e.g. a significant penalty
in case of unaligned accesses). It wouldn't make sense to overrule this.


Sure it makes sense since this is uarch fundamental.

Because of following snippet, unaligned access codegen can only be mademore restrictive and not less (and really requires a compiler rebuild toexperiment)


  riscv_slow_unaligned_access_p = (cpu->tune_param->slow_unaligned_access
                   || TARGET_STRICT_ALIGN);

Thx,
-Vineet

Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

Reply via email to