https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441
Tamar Christina <tnfchris at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Summary|[14 Regression] Fail to |[14 Regression] Fail to |fold the last element with |fold the last element with |multiple loop |multiple loop since | |g:2efe3a7de0107618397264017 | |fb045f237764cc7 Last reconfirmed| |2024-02-22 Status|UNCONFIRMED |NEW Keywords|needs-bisection | --- Comment #26 from Tamar Christina <tnfchris at gcc dot gnu.org> --- (In reply to Richard Biener from comment #18) > (In reply to Tamar Christina from comment #17) > > Ok, bisected to > > > > g:2efe3a7de0107618397264017fb045f237764cc7 is the first bad commit > > commit 2efe3a7de0107618397264017fb045f237764cc7 > > Author: Hao Liu <h...@os.amperecomputing.com> > > Date: Wed Dec 6 14:52:19 2023 +0800 > > > > tree-optimization/112774: extend the SCEV CHREC tree with a nonwrapping > > flag > > > > Before this commit we were unable to analyse the stride of the access. > > After this niters seems to estimate the loop trip count at 4 and after that > > the logs diverge enormously. > > Hum, but that's backward and would match to what I said in comment#2 - we > should get better code with that. > Ok, so I've dug more into this today. It's definitely this commit that's causing it. The reason is we no longer consider masked gather/scatters. Before this commit we the gather pattern would trigger: tresg.i:3:275: note: gather/scatter pattern: detected: a[_2] = b.3_3; tresg.i:3:275: note: gather_scatter pattern recognized: .SCATTER_STORE ((sizetype) &a, _2, 4, b.3_3); and the use of the masked scatter is what's causing the epilogue to not be required and why it generates better code. It's not the loads. The issue is that vect_analyze_data_refs only considers gather/scatters IF DR analysis fails, which it did before: tresg.c:31:29: missed: failed: evolution of offset is not affine. base_address: offset from base address: constant offset from base address: step: base alignment: 0 base misalignment: 0 offset alignment: 0 step alignment: 0 base_object: array1 Access function 0: {{m_112 * 2, +, 24}_3, +, 2}_4 Access function 1: 0 Creating dr for array1[0][_8] this now succeeds after the quoted commit: success. base_address: &array1 offset from base address: (ssizetype) ((sizetype) (m_111 * 2) * 2) constant offset from base address: 0 step: 4 base alignment: 8 base misalignment: 0 offset alignment: 4 step alignment: 4 base_object: array1 Access function 0: {{m_112 * 2, +, 24}_3, +, 2}_4 Access function 1: 0 Creating dr for array1[0][_8] so we never enter /* Check that analysis of the data-ref succeeded. */ if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_INIT (dr) || !DR_STEP (dr)) { and without the IFN scatters it tries deinterleaving scalar stores to scatters: tresg.c:29:22: note: Detected single element interleaving array1[0][_8] step 4 tresg.c:29:22: note: Detected single element interleaving array1[1][_8] step 4 tresg.c:29:22: note: Detected single element interleaving array1[2][_8] step 4 tresg.c:29:22: note: Detected single element interleaving array1[3][_8] step 4 tresg.c:29:22: note: Detected single element interleaving array1[0][_1] step 4 tresg.c:29:22: note: Detected single element interleaving array1[1][_1] step 4 tresg.c:29:22: note: Detected single element interleaving array1[2][_1] step 4 tresg.c:29:22: note: Detected single element interleaving array1[3][_1] step 4 tresg.c:29:22: missed: not consecutive access array2[_4][_8] = _70; tresg.c:29:22: note: using strided accesses tresg.c:29:22: missed: not consecutive access array2[_4][_1] = _68; tresg.c:29:22: note: using strided accesses ... tresg.c:29:22: note: using gather/scatter for strided/grouped access, scale = 2 but without the SCATTER_STORE IFN it never tries masking the scatter, so we lose MASK_SCATTER_STORE and hence we generate worse code because the whole loop can no longer be predicated However trying to force it generates an ICE so I guess it's not that simple.