https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114252
--- Comment #15 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Note one thing that might help is define an alternative for bswap that takes a memory operand and just do the load that way. That will definitely help in the original code.