https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124037
--- Comment #7 from Victor Do Nascimento <victorldn at gcc dot gnu.org> ---
My current reasoning on the proposed fix and expected resultant behavior of
trying to vectorize reads from non power-of-2 sized structs:
Enabling scalarized reads from structs whose sizes are not powers of 2
and where not all scalar iterations are known to be inbounds.
-----------------------------------------------------------------------
At the moment, these reads are are unsafe because of under-aligned
requirements for vectorization.
As reported in PR123588, the example vectorized memory access
exceeding 364 bytes only checks for 32-byte alignment as the
individual vectorized loop iterartion scalarizes its vector memory
access to individual 32-byte chunk reads. Where all scalar iterations
are known to be within bounds this is not a problem. Even if a loop
iteration crosses page boundaries, knowing all scalar reads to be safe
means that reads beyond the current page will not try to read from
unallocated memory, situation which would otherwise trigger a
segfault.
Given this situation of unknown bounds and using PR123588 as our
example, we'd require all 364 bytes to be within a single cache
line in every iteration; In more general terms, an alignment
equivalent to the entire total read size would be required. The
reason for the inadequate 32-byte alignment check, however, comes
the following section of code:
if ((vf.is_constant () && pow2p_hwi (new_alignment.to_constant ()))
|| (!vf.is_constant () && pow2p_hwi (align_factor_c)))
vector_alignment = new_alignment;
where if the total read size, given by `vf * DR_GROUP_SIZE
(DR_GROUP_FIRST_ELEMENT (stmt_info))', is not a power of 2 we don't
update the alignment requirement to reflect the vectorized read
size. For a non-scalarized vector access, this is presumably not an
issue as vector loads are expected to be a power of 2 anyway and non
power-of-2 loads will ultimately be rejected.
Suppose we set `vector_alignment' to `HOST_WIDE_INT_1U << ceil_log2
(new_alignment.to_constant ())' and set the access type to
`dr_aligned'. This would potentially generate a run-time check for
this alignment value which would guarantee our starting address to be
aligned to a power of 2. Would this be necessary and sufficient to
ensure that all subsequent scalarized loads are safe?
- Suppose also that some group of N iterations of vectorized accesses
is guaranteed not to cross a page boundary.
- If our individual object size is not a power of 2, there is no value
for N whereby after N accesses the address increment will add up to
a power of 2, condition necessary to ensure that after the initial
N accesses we are again aligned at a power of 2, such that any
subsequent group of N access will also fall within a page boundary.
- This can be proved by contradiction. If obj_size * N = 2^M for some
arbitrary M, then it follows that obj_size = (2^M)/N, simplified to
obj_size = 2^(M-N), holds. This implies that our object size is a
power of 2, which contradicts our initial assumption that
`obj_size' is not a power of 2.
- This unmet necessary precondition for ensuring access safety for
will be flagged as such by the `multiple_p (target_alignment, read_amount)'
in `get_load_store_type', causing the function to return false and
trigger a fail in vectorization of the loop.