On 11/2/21 1:09 PM, Christoph Müllner wrote:
Without overlap_op_by_pieces we get: 8e: 00053023 sd zero,0(a0) 92: 00052423 sw zero,8(a0) 96: 00051623 sh zero,12(a0) 9a: 00050723 sb zero,14(a0)To generate even the non optimized code above with gcc 11 [1][2], what do I need to do. Despite -mno-strict-align and trying -mtune={rocket, sifive-7-series}, I only get the fully unrolled versionYou need a tuning struct with slow_unaligned_access == false. Both, Rocket and Sifive 7, have slow unaligned access set to true. Mainline you have thead-c906 which would work.
But doesn't -mno-strict-align imply that ? Thx, -Vineet
