https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96966
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2020-09-08 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Keywords| |alias Target Milestone|--- |8.5 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Martin Sebor from comment #1) > According to Godbolt, GCC 8.1 and 8.2 emit optimal code for both functions > but GCC 8.3 emits the less optimal code for f and has g jump to it. > Starting with 10.1, GCC emits the same suboptimal code for both functions. This is likely "caused" by 08dfb1d682a707f7319aafec28edda424395dae5, aka the fix for PR91108 which was also backported. In the IL <bb 2> : _5 = MEM <__int128 unsigned> [(char * {ref-all})s_4(D)]; MEM <__int128 unsigned> [(char * {ref-all})&a] = _5; _8 = MEM <__int128 unsigned> [(char * {ref-all})s_4(D)]; MEM <__int128 unsigned> [(char * {ref-all})&a] = _8; return; we lost the information that MEM <__int128 unsigned> [(char * {ref-all})s_4(D)] and MEM <__int128 unsigned> [(char * {ref-all})&a] do not partially overlap. The memcpy call guaranteed that. The way the aliasing code rules out partial overlap is by using alignment which doesn't help us here. That it worked in GCC 8.[12] was due to bad code in VN that ignored the possibility of a partial overlap here. We eventually could lower memcpy (a, s, 16) to load + store with noting they are independent using MR_DEPENDENCE_CLIQUE/BASE but this may cause depleting of the clique resource on artificial testcases quickly (we only have 16bits for clique). Shifting bit allocation between clique and base might be a possibility there, but at least clique overflow mitigation would need to be put in place.