https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91735
--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Richard Biener from comment #3) > Reducing the VF here should be the goal. For the particular case "filling" > the holes with neutral data and blending in the original values at store time > will likely be optimal. So do > > tem = vector load > zero all [4] elements > compute > blend in 'tem' into the [4] elements > vector store MASKMOVDQU [1] should be an excellent fit here. [1] https://www.felixcloutier.com/x86/maskmovdqu