https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114908
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jamborm at gcc dot gnu.org --- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- One issue with V load3(const unsigned long* ptr) { V ret = {}; __builtin_memcpy(&ret, ptr, 3 * sizeof(unsigned long)); is that we cannot load a vector worth of data from ptr because that might trap, so rewriting that into V tem = *ptr; ret = VEC_PERM <ret, tem, { 4, 5, 6, 3 }>; isn't possible. There is neither a vector<T, 3> type nor an integer type with 3 * sizeof (T) size, so a lowering to sth like x = BIT_FIELD_REF <ptr, ...> BIT_FIELD_REF <&ret, ...> = x; isn't possible either. But we can possibly handle __builtin_memcpy (&A, &B, ...); when we have an underlying decl of appropriate size/type. This could be done in update_address_taken which can be teached to optimistically analyze this and rewrite the memcpy when A and B can both become registers. That would not cover load3 (above problem), or split3_1, concat1_3 or perm (aggregate and not vector register type). The latter three would require adjustments in SRA. We have for example MEM <char[8]> [(struct simd4 *)&ret + 24B] = {}; __builtin_memcpy (&ret, &x, 24); D.3390 = ret; here SRA isn't done because ret is addressable because of the memcpy, so it requires SRA to realize the addresses still don't escape (all accesses remain direct). IIRC there's some improvement in SRA to this respect already? As said there's no good general way to lower this kind of memcpy but SRA can treat a memcpy as a partial aggregate assignment (when disqualifying partial "field" or padding accesses). IMO enhancing SRA to better handle memcpy/memmove and memset should be a priority. It already sees Candidate (3316): x Candidate (3389): ret Candidate (3390): D.3390 Candidate (3320): ret Allowed ADDR_EXPR of ret because of __builtin_memcpy (&ret, &x, 24); Allowed ADDR_EXPR of x because of __builtin_memcpy (&ret, &x, 24); Will attempt to totally scalarize ret (UID: 3320): Will attempt to totally scalarize D.3390 (UID: 3390): ! Disqualifying x - No scalar replacements to be created. Created a replacement for ret offset: 256, size: 64: ret$b$dataD.3392 but I think it simply fails to handle memcpy in the scalarization attempt and when building accesses.