On Tue, Nov 2, 2021 at 9:15 PM Vineet Gupta <vine...@rivosinc.com> wrote: > > > > On 11/2/21 1:09 PM, Christoph Müllner wrote: > >>>> Without overlap_op_by_pieces we get: > >>>> 8e: 00053023 sd zero,0(a0) > >>>> 92: 00052423 sw zero,8(a0) > >>>> 96: 00051623 sh zero,12(a0) > >>>> 9a: 00050723 sb zero,14(a0) > >> To generate even the non optimized code above with gcc 11 [1][2], what > >> do I need to do. Despite -mno-strict-align and trying -mtune={rocket, > >> sifive-7-series}, I only get the fully unrolled version > > You need a tuning struct with slow_unaligned_access == false. > > Both, Rocket and Sifive 7, have slow unaligned access set to true. > > Mainline you have thead-c906 which would work. > > But doesn't -mno-strict-align imply that ?
Opposite direction. With `-mno-strict-align` emitted code might contain unaligned accesses if `slow_unaligned_access == false`. If `slow_unaligned_access == false`, then `-mstrict-align` will prevent unaligned accesses. Usually, there is a good reason why `slow_unaliged_access` is set to `true` (e.g. a significant penalty in case of unaligned accesses). It wouldn't make sense to overrule this. > > Thx, > -Vineet