[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

d_vampile at 163 dot com via Gcc-bugs Mon, 14 Mar 2022 05:23:25 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398


d_vampile <d_vampile at 163 dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |d_vampile at 163 dot com

--- Comment #48 from d_vampile <d_vampile at 163 dot com> ---
(In reply to Jiu Fu Guo from comment #41)
> (In reply to Wilco from comment #40)
> > (In reply to Jiu Fu Guo from comment #39)
> > > I’m thinking to draft a patch for this optimization.  If any suggestions,
> > > please point out, thanks.
> > 
> > Which optimization to be precise? Besides unrolling I haven't seen a
> > proposal for an optimization which is both safe and generally applicable.
> 
> 1. For unroll, there are still branches in the loop. And then need careful
> merge on those reading and comparison.  Another thing about unroll would be
> that, if we prefer to optimize this early in GIMPLE, we still not GIMPLE
> unroll on it.
> while (len != max)
> {
>     if (p[len] != cur[len])
>       break; ++len;
>     if (p[len] != cur[len])
>       break; ++len;
>     if (p[len] != cur[len])
>       break; ++len;
> ....
> }
> 
> 2. Also thinking about if it makes sense to enhance GIMPLE vectorization
> pass.  In an aspect that using a vector to read and compare, also need to
> handle/merge compares into vector compare and handle early exit carefully.
> if (len + 8 < max && buffers not cross page) ///(p&4K) == (p+8)&4k?
> 4k:pagesize
>  while (len != max)
> {
>  vec a = xx p;
>  vec b = xx cur;
>  if (a != b) /// may not only for comparison 
>   {....;break;}
>  len += 8;
> }
> 
> 3. Introduce a new stand-alone pass to optimize reading/computing shorter
> types into large(dword/vector) reading/computing.
> 
> Thanks a lot for your comments/suggestions!

Any progress or patches for the new pass mentioned in point 3? Or new ideas?

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

Reply via email to