https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111502
--- Comment #6 from Andrew Waterman <andrew at sifive dot com> --- Ack, I misunderstood your earlier message. You're of course right that the load/load/shift/or sequence is preferable to the load/load/store/store/load sequence, on just about any practical implementation. That the memcpy version is optimized less optimally does seem to be disjoint from the issue Andrew mentioned.