https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- The following testcase reproduces the assembly: typedef __UINT64_TYPE__ uint64_t; void poly_double_le2 (unsigned char *out, const unsigned char *in) { uint64_t W[2]; __builtin_memcpy (&W, in, 16); uint64_t carry = (W[1] >> 63) * 135; W[1] = (W[1] << 1) ^ (W[0] >> 63); W[0] = (W[0] << 1) ^ carry; __builtin_memcpy (out, &W[0], 8); __builtin_memcpy (out + 8, &W[1], 8); }