https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109072
--- Comment #5 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> --- Following an off-list discussion: maybe one option (for now) would be to make the aarch64 builtins lowering code look for vld1s whose arguments are ADDR_EXPRs of local VAR_DECLs (or maybe even global VAR_DECLs). It could stash those VAR_DECLs in a function-specific set and then the aarch64 costing code could look for stores to those VAR_DECLs. If it sees such a store, it would aggressively promote a vector store, in the hope of later propagation and DSE. We could do this even when we don't actually lower the vld1, so that it would work for big-endian too.