https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114080
--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Note, most not too old compilers handle small constant size memcpy as an efficient way to load/store unaligned values and it is also portable. So, instead of *dstp = *srcp ^ *bufp; if all those can be unaligned use __uint128_t t1, t2; memcpy (&t1, srcp, sizeof (t1)); memcpy (&t2, bufp, sizeof (t2)); t1 = t1 ^ t2; memcpy (dstp, &t1, sizeof (t1)); should result in decent code (unless -O0 of course).