>> > for (i = 0; i < lim; i++) {
>> > - xts_tweak_encdec(datactx, decfunc, src, dst, (uint8_t *)&T);
>> > + xts_uint128 S, D;
>> > +
>> > + memcpy(&S, src, XTS_BLOCK_SIZE);
>> > + xts_tweak_encdec(datactx, decfunc, &S, &D, &T);
>> > + memcpy(dst, &D, XTS_BLOCK_SIZE);
>>
>> Why do you need S and D?
>
> I think src & dst pointers can't be guaranteed to be aligned
> sufficiently for int64 operations, if we just cast from uint8t*.
I see. I did a quick test without the memcpy() calls and it doesn't seem
to have a visible effect on performance, but if it turns out that it
does then maybe this is worth investigating further. I suspect all
buffers received by this code are allocated with qemu_try_blockalign()
anyway, so it should be safe.
Berto