https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84719
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> --- I wonder if this is not __builtin_memcpy but rather how to optimize and putting in the lower bytes of an uint64_t. I think your benchmark is not benchmarking what you think it is benchmarking.