On 11/2/21 1:09 PM, Christoph Müllner wrote:
Without overlap_op_by_pieces we get:
8e: 00053023 sd zero,0(a0)
92: 00052423 sw zero,8(a0)
96: 00051623 sh zero,12(a0)
9a: 00050723 sb zero,14(a0)
To generate even the non optimized code above with gcc 11 [1][2], what
do I need to do. Despite -mno-strict-align and trying -mtune={rocket,
sifive-7-series}, I only get the fully unrolled version
You need a tuning struct with slow_unaligned_access == false.
Both, Rocket and Sifive 7, have slow unaligned access set to true.
Mainline you have thead-c906 which would work.
But doesn't -mno-strict-align imply that ?
Thx,
-Vineet