https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
            Summary|[14 Regression] Fail to     |[14 Regression] Fail to
                   |fold the last element with  |fold the last element with
                   |multiple loop               |multiple loop since
                   |                            |g:2efe3a7de0107618397264017
                   |                            |fb045f237764cc7
   Last reconfirmed|                            |2024-02-22
             Status|UNCONFIRMED                 |NEW
           Keywords|needs-bisection             |

--- Comment #26 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #18)
> (In reply to Tamar Christina from comment #17)
> > Ok, bisected to
> > 
> > g:2efe3a7de0107618397264017fb045f237764cc7 is the first bad commit
> > commit 2efe3a7de0107618397264017fb045f237764cc7
> > Author: Hao Liu <h...@os.amperecomputing.com>
> > Date:   Wed Dec 6 14:52:19 2023 +0800
> > 
> >     tree-optimization/112774: extend the SCEV CHREC tree with a nonwrapping
> > flag
> > 
> > Before this commit we were unable to analyse the stride of the access.
> > After this niters seems to estimate the loop trip count at 4 and after that
> > the logs diverge enormously.
> 
> Hum, but that's backward and would match to what I said in comment#2 - we
> should get better code with that.
> 

Ok, so I've dug more into this today.  It's definitely this commit that's
causing it.  The reason is we no longer consider masked gather/scatters.

Before this commit we the gather pattern would trigger:

tresg.i:3:275: note:   gather/scatter pattern: detected: a[_2] = b.3_3;        
                                                                               
                                                                               
                             tresg.i:3:275: note:   gather_scatter pattern
recognized: .SCATTER_STORE ((sizetype) &a, _2, 4, b.3_3);   

and the use of the masked scatter is what's causing the epilogue to not be
required and why it generates better code.  It's not the loads.

The issue is that vect_analyze_data_refs only considers gather/scatters IF DR
analysis fails, which it did before:

tresg.c:31:29: missed:  failed: evolution of offset is not affine.
        base_address:
        offset from base address:
        constant offset from base address:
        step:
        base alignment: 0
        base misalignment: 0
        offset alignment: 0
        step alignment: 0
        base_object: array1
        Access function 0: {{m_112 * 2, +, 24}_3, +, 2}_4
        Access function 1: 0
Creating dr for array1[0][_8]

this now succeeds after the quoted commit:

success.
        base_address: &array1
        offset from base address: (ssizetype) ((sizetype) (m_111 * 2) * 2)
        constant offset from base address: 0
        step: 4
        base alignment: 8
        base misalignment: 0
        offset alignment: 4
        step alignment: 4
        base_object: array1
        Access function 0: {{m_112 * 2, +, 24}_3, +, 2}_4
        Access function 1: 0
Creating dr for array1[0][_8]

so we never enter

      /* Check that analysis of the data-ref succeeded.  */
      if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_INIT (dr)
          || !DR_STEP (dr))
        {

and without the IFN scatters it tries deinterleaving scalar stores to scatters:

tresg.c:29:22: note:   Detected single element interleaving array1[0][_8] step
4
tresg.c:29:22: note:   Detected single element interleaving array1[1][_8] step
4
tresg.c:29:22: note:   Detected single element interleaving array1[2][_8] step
4
tresg.c:29:22: note:   Detected single element interleaving array1[3][_8] step
4
tresg.c:29:22: note:   Detected single element interleaving array1[0][_1] step
4
tresg.c:29:22: note:   Detected single element interleaving array1[1][_1] step
4
tresg.c:29:22: note:   Detected single element interleaving array1[2][_1] step
4
tresg.c:29:22: note:   Detected single element interleaving array1[3][_1] step
4
tresg.c:29:22: missed:   not consecutive access array2[_4][_8] = _70;
tresg.c:29:22: note:   using strided accesses
tresg.c:29:22: missed:   not consecutive access array2[_4][_1] = _68;
tresg.c:29:22: note:   using strided accesses

...

tresg.c:29:22: note:   using gather/scatter for strided/grouped access, scale =
2

but without the SCATTER_STORE IFN it never tries masking the scatter, so we
lose MASK_SCATTER_STORE and hence we generate worse code because the whole loop
can no longer be predicated

However trying to force it generates an ICE so I guess it's not that simple.

Reply via email to