https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124037

--- Comment #7 from Victor Do Nascimento <victorldn at gcc dot gnu.org> ---
My current reasoning on the proposed fix and expected resultant behavior of
trying to vectorize reads from non power-of-2 sized structs:

Enabling scalarized reads from structs whose sizes are not powers of 2
and where not all scalar iterations are known to be inbounds.
-----------------------------------------------------------------------

At the moment, these reads are are unsafe because of under-aligned
requirements for vectorization.

As reported in PR123588, the example vectorized memory access
exceeding 364 bytes only checks for 32-byte alignment as the
individual  vectorized loop iterartion scalarizes its vector memory
access to individual 32-byte chunk reads. Where all scalar iterations
are known to be within bounds this is not a problem. Even if a loop
iteration crosses page boundaries, knowing all scalar reads to be safe
means  that reads beyond the current page will not try to read from
unallocated memory,  situation which would otherwise trigger a
segfault. 

Given this situation of unknown bounds and using PR123588 as our
example,  we'd require all 364 bytes to be within a single cache
line in every iteration; In more general terms, an alignment
equivalent to the entire total read size would be required.  The
reason for the inadequate 32-byte alignment check, however, comes
the following section of code: 

  if ((vf.is_constant () && pow2p_hwi (new_alignment.to_constant ()))
    || (!vf.is_constant () && pow2p_hwi (align_factor_c)))
    vector_alignment = new_alignment;

where if the total read size, given by `vf * DR_GROUP_SIZE
(DR_GROUP_FIRST_ELEMENT (stmt_info))', is not a power of 2 we don't
update the alignment requirement to reflect the vectorized  read
size. For a non-scalarized vector access, this is presumably not an
issue as vector loads are expected to be a power of 2 anyway and  non
power-of-2 loads will ultimately be rejected. 

Suppose we set `vector_alignment' to `HOST_WIDE_INT_1U << ceil_log2
(new_alignment.to_constant ())' and set the access type to
`dr_aligned'.  This would potentially generate a  run-time check for
this alignment value which would guarantee our starting  address to be 
aligned to a power of 2. Would this be necessary and sufficient  to
ensure that all subsequent scalarized loads are safe? 

- Suppose also that some group of N iterations of vectorized accesses
  is guaranteed not to cross a page boundary. 
- If our individual object size is not a power of 2, there is no value
  for N whereby after N accesses the address increment will add up  to
  a power of 2, condition necessary to ensure that after the  initial
  N accesses we are again aligned at a power of 2, such that any
  subsequent group of N access will also fall within a page boundary. 
- This can be proved by contradiction. If obj_size * N = 2^M for some
  arbitrary M, then it follows that obj_size = (2^M)/N, simplified to
  obj_size = 2^(M-N), holds. This implies that our object size is a
  power of 2,  which contradicts our initial assumption that
  `obj_size'  is not  a power of 2. 
- This unmet necessary precondition for ensuring access safety for
  will be flagged as such by the `multiple_p  (target_alignment, read_amount)'
  in `get_load_store_type', causing  the function  to return false and
  trigger a fail in vectorization of the loop.

Reply via email to