https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695
--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Martin Liška from comment #10)
> > where the XOR16 is implemented as:
> >
> > #define XORN(in1,in2,out,len) \
> > do { \
> > uint _i; \
> > for (_i = 0; _i < len/sizeof(ulong); ++_i) \
> > *((ulong*)(out)+_i) = *((ulong*)(in1)+_i) ^
> > *((ulong*)(in2)+_i); \
> > } while(0)
>
> I can confirm that changing that to:
>
> #define XORN(in1, in2, out, len) \
> do \
> { \
> uint _i; \
> for (_i = 0; _i < len; ++_i) \
> *(out + _i) = *(in1 + _i) ^ *(in2 + _i); \
> } while (0)
>
> fixes the problem. It seems very close to what I saw here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201#c13
It depends on if those arrays were stored as ulong or will be later read as
ulong or something else.
One could also use typedef ulong ulong_alias __attribute__((may_alias));
and use ulong_alias* above, or memcpy to/out of ulong temporaries.