On 11/2/21 2:18 PM, Christoph Müllner wrote:
On Tue, Nov 2, 2021 at 9:15 PM Vineet Gupta <vine...@rivosinc.com> wrote:


On 11/2/21 1:09 PM, Christoph Müllner wrote:
Without overlap_op_by_pieces we get:
     8e:   00053023                sd      zero,0(a0)
     92:   00052423                sw      zero,8(a0)
     96:   00051623                sh      zero,12(a0)
     9a:   00050723                sb      zero,14(a0)
To generate even the non optimized code above with gcc 11 [1][2], what
do I need to do. Despite -mno-strict-align and trying -mtune={rocket,
sifive-7-series}, I only get the fully unrolled version
You need a tuning struct with slow_unaligned_access == false.
Both, Rocket and Sifive 7, have slow unaligned access set to true.
Mainline you have thead-c906 which would work.
But doesn't -mno-strict-align imply that ?
Opposite direction.

Took me a while to unpack :-)

With `-mno-strict-align` emitted code might contain unaligned accesses
if `slow_unaligned_access == false`.
If `slow_unaligned_access == false`, then `-mstrict-align` will
prevent unaligned accesses.
Usually, there is a good reason why `slow_unaliged_access` is set to
`true` (e.g. a significant penalty
in case of unaligned accesses). It wouldn't make sense to overrule this.

Sure it makes sense since this is uarch fundamental.
Because of following snippet, unaligned access codegen can only be made more restrictive and not less (and really requires a compiler rebuild to experiment)

  riscv_slow_unaligned_access_p = (cpu->tune_param->slow_unaligned_access
                   || TARGET_STRICT_ALIGN);

Thx,
-Vineet

Reply via email to