https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114252
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- Note I do understand what you are saying, just the middle-end in detecting and using __builtin_bswap32 does what it does everywhere else - it checks whether the target implements the operation. The middle-end doesn't try to actually compare costs (it has no idea of the bswapsi costs), and it most definitely doesn't see how AVR is special in having only QImode registers and thus the created SImode load (which the target supports!) will end up as four registers. To me a 'bswap' on AVR never makes sense since whatever is swapped will be _always_ available as a set of byte registers. That's why I question AVR exposing bswapsi to the middle-end rather than suggesting the middle-end should maybe see whether AVR has any regs of HImode or larger. Note that would break for targets that could eventually do a load-multiple byteswapped to a set of QImode regs (guess there's no such one in GCC at least), but it's the only heuristic that might work here. The only thing that maybe would make sense with AVR exposing bswapsi is users calling __builtin_bswap but since it always expands as a libcall even that makes no sense. So my preferred fix would be to remove bswapsi from avr.md? Does it benefit from recognizing bswap done with shifts on an int?