https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99414
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amker at gcc dot gnu.org, | |rguenth at gcc dot gnu.org Blocks| |53947 Status|UNCONFIRMED |NEW Last reconfirmed| |2021-03-08 Ever confirmed|0 |1 Component|middle-end |tree-optimization Keywords| |missed-optimization --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- linterchange says: Consider loop interchange for loop_nest<1 - 3> Access Strides for DRs: a[i_33]: <0, 4, 0> b[i_33]: <0, 4, 0> c[i_33]: <0, 4, 0> a[i_33]: <0, 4, 0> aa[_6][i_33]: <0, 4, 1024> bb[j_34][i_33]: <0, 4, 1024> aa[j_34][i_33]: <0, 4, 1024> Loop(3) carried vars: Induction: j_34 = {1, 1}_3 Induction: ivtmp_53 = {255, 4294967295}_3 Loop(2) carried vars: Induction: i_33 = {0, 1}_2 Induction: ivtmp_51 = {256, 4294967295}_2 and then doesn't do anything. I suppose the best thing to do here is to first distribute the loop nest, but our cost modeling fuses the two obvious candidates: Fuse partitions because they have shared memory refs: Part 1: 0, 1, 2, 3, 4, 5, 6, 7, 19, 20, 21 Part 2: 0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 so this is a case that asks for better cost modeling there. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations