https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122074
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Status|WAITING |UNCONFIRMED
Ever confirmed|1 |0
Summary|Wrong code for avx512 |Not fusim unaligned load
|intrinsic |into cmp with mask for
| |avx512 intrinsic
Keywords| |missed-optimization
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
> Suffix "_u" in __m256i_u emphpasize we are > using an unaligned vector which
> should be > processed specially
No it does not mean that. It does mean it is unaligned.
And gcc uses an unaligned load even:
vmovdqu ymm1, YMMWORD PTR [rdi]
And which is why at -O0, the loads are via bytes.
Now there is a missed optimization of not fusing the load into the compare.