https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114908
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mkretz at gcc dot gnu.org, | |rguenth at gcc dot gnu.org Target|x86-64-v3 |x86_64-*-* --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- The memcpy() calls are definitely a hindrance. I suppose that update-address-taken could replace some of them with BIT_INSERT_EXPRs but then it doesn't handle any calls right now. Replacing the memcpy on its own would be possible (special-casing just the "sub-vector" case) like __builtin_memcpy (&__r, &data, 24); to _1 = __r; _2 = data; _3 = VEC_PERM <_2, _1, {0, 1, 2, 7 }>; __r = _3; or if copying a single element using BIT_INSERT_EXPR. OTOH that's not good if __r stays in memory (the whole vector store might be good to avoid STLF fails, but the read will be bad for the same reason). The update-address-taken pass would know __r and data become registers. We already have a similar case involving ATOMIC_COMPARE_EXCHANGE that has delayed processing requring register arguments. It might or might not be a good example how to deal with this.