https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78869
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P3 |P2 Status|UNCONFIRMED |NEW Known to work| |5.4.0 Keywords| |missed-optimization Last reconfirmed| |2016-12-20 Component|c |tree-optimization Ever confirmed|0 |1 Summary|Strange __builtin_memcpy |[6/7 Regression] Strange |optimisations |__builtin_memcpy | |optimisations Target Milestone|--- |6.3 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- It's SRA which handles memcpy_test (char * ptr) { char * D.1764; char temp[64]; try { D.1764 = ptr + 64; MEM[(char * {ref-all})&temp] = MEM[(char * {ref-all})D.1764]; MEM[(char * {ref-all})ptr] = MEM[(char * {ref-all})&temp]; by "scalarizing" temp byte-wise: <bb 2>: MEM[(char * {ref-all})&temp] = MEM[(char * {ref-all})ptr_1(D) + 64B]; temp$0_2 = MEM[(char * {ref-all})ptr_1(D) + 64B]; temp$1_7 = MEM[(char * {ref-all})ptr_1(D) + 65B]; ... MEM[(char * {ref-all})ptr_1(D)] = temp$0_2; MEM[(char * {ref-all})ptr_1(D) + 1B] = temp$1_7; MEM[(char * {ref-all})ptr_1(D) + 2B] = temp$2_8; MEM[(char * {ref-all})ptr_1(D) + 3B] = temp$3_9; ... return; ugh. Looks like memcpy folding doesn't handle D.1765 = ptr + 64; __builtin_memcpy (&temp[0], D.1765, 64); __builtin_memcpy (ptr, &temp[0], 64); so we're just "lucky" there. Regression from GCC 5, the SRA capability was added with GCC 6.