https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99414
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |amker at gcc dot gnu.org,
| |rguenth at gcc dot gnu.org
Blocks| |53947
Status|UNCONFIRMED |NEW
Last reconfirmed| |2021-03-08
Ever confirmed|0 |1
Component|middle-end |tree-optimization
Keywords| |missed-optimization
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
linterchange says:
Consider loop interchange for loop_nest<1 - 3>
Access Strides for DRs:
a[i_33]: <0, 4, 0>
b[i_33]: <0, 4, 0>
c[i_33]: <0, 4, 0>
a[i_33]: <0, 4, 0>
aa[_6][i_33]: <0, 4, 1024>
bb[j_34][i_33]: <0, 4, 1024>
aa[j_34][i_33]: <0, 4, 1024>
Loop(3) carried vars:
Induction: j_34 = {1, 1}_3
Induction: ivtmp_53 = {255, 4294967295}_3
Loop(2) carried vars:
Induction: i_33 = {0, 1}_2
Induction: ivtmp_51 = {256, 4294967295}_2
and then doesn't do anything.
I suppose the best thing to do here is to first distribute the loop nest,
but our cost modeling fuses the two obvious candidates:
Fuse partitions because they have shared memory refs:
Part 1: 0, 1, 2, 3, 4, 5, 6, 7, 19, 20, 21
Part 2: 0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21
so this is a case that asks for better cost modeling there.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations