https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111502
Andrew Waterman <andrew at sifive dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andrew at sifive dot com --- Comment #1 from Andrew Waterman <andrew at sifive dot com> --- This isn't actually a bug. Quoting the RVA profile spec, "misaligned loads and stores might execute extremely slowly"--which is code for the possibility that they might be trapped and emulated, taking hundreds of clock cycles apiece. So the default behavior of emitting byte accesses is best when generating generic code. (Of course, when tuning for a particular microarchitecture, the shorter code sequence may be emitted.)