Thanks Robin.

- IMHO we need to check both series for overflow, if step2 overflows in the
smaller type isn't the result equally wrong?

The series2 will shift right before IOR, thus the overflow bits never effect
on the final result.
For example, the series2 will be similar as below after shift.

v2.b = {0, 17, 0, 33, 0, 49, 0, 65, 0, 81, 0, 97, 0, 113, 0, 129}

Hmm, I think it's the other way around and it is being shifted left. In that case we're only keeping the lower bits and the "overflow bits" don't matter. That means we should indeed be OK here.

For i32 interleave, we will extend to i64 for series1 and series2, I think we
can design
a series like base + i * step with last element overflow to bit 65 but have
bits 32-64 all zero, and then
the element before the last one will have overflow bits pollute the result.

It seems checking the last 2 elements is good enough here.

I guess the worst that can happen theoretically is i = 2 ^ 32 - 1,
step = 2 ^ 32 - 1, and base = 2 ^ 32 - 1?  But i is bound by our "largest"
mode so IMHO a 64-bit value should be sufficient.

We always have at least two elements BTW, no need to check that.

Please also adjust the comment above the code regarding overflow.

We need to either forbid variable-length vectors for this scheme altogether or
determine the maximum runtime number of elements possible for the current mode.
We don't support more than 65536 bits in a vector which would "naturally" limit
the number of elements.  I suppose variable-length vectors are not too common
for this scheme and falling back to a less optimized one shouldn't be too
costly.  That's just a gut feeling, though.

Make sense, will fall back to less optimized for VLA.

Have you checked how the more generic version (last else branch) compares? I could imagine it's close or even better for our case.

Reply via email to