https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102926
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- I would also guess that load_tp_hard is slower and the large mnemonic suggests a larger instruction (ok, but we're risc and thus fixed size instructions?)