https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117978
--- Comment #4 from ktkachov at gcc dot gnu.org --- (In reply to Richard Sandiford from comment #3) > I think this would be better done in expand rather than gimple. The gimple > representation would be a vector load in a 128-bit type, followed by a > zeroing extension to the original SVE type. I'm not sure how easy it is to > represent the zeroing extension as things stand, but either way, it would be > converting one load into one load + one other operation. The result seems > more complicated in gimple terms, so I think the natural gimple fold would > be in the opposite direction. > > If we do it in expand, we'll be able to see the constant if we use an > appropriate predicate. > > Also: > > * We should do this for 8-bit, 16-bit, 32-bit, and 64-bit quantities, not > just 128-bit. > > * We should do the same thing for LD2/3/4 and ST2/3/4 (64-bit and 128-bit > only). > > * Except for the single-element case, the optimisation is only valid for > little-endian targets. Do we also need to guard this under TARGET_NON_STREAMING?