https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398

--- Comment #41 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Wilco from comment #40)
> (In reply to Jiu Fu Guo from comment #39)
> > I’m thinking to draft a patch for this optimization.  If any suggestions,
> > please point out, thanks.
> 
> Which optimization to be precise? Besides unrolling I haven't seen a
> proposal for an optimization which is both safe and generally applicable.

1. For unroll, there are still branches in the loop. And then need careful
merge on those reading and comparison.  Another thing about unroll would be
that, if we prefer to optimize this early in GIMPLE, we still not GIMPLE unroll
on it.
while (len != max)
{
    if (p[len] != cur[len])
      break; ++len;
    if (p[len] != cur[len])
      break; ++len;
    if (p[len] != cur[len])
      break; ++len;
....
}

2. Also thinking about if it makes sense to enhance GIMPLE vectorization pass. 
In an aspect that using a vector to read and compare, also need to handle/merge
compares into vector compare and handle early exit carefully.
if (len + 8 < max && buffers not cross page) ///(p&4K) == (p+8)&4k? 4k:pagesize
 while (len != max)
{
 vec a = xx p;
 vec b = xx cur;
 if (a != b) /// may not only for comparison 
  {....;break;}
 len += 8;
}

3. Introduce a new stand-alone pass to optimize reading/computing shorter types
into large(dword/vector) reading/computing.

Thanks a lot for your comments/suggestions!

Reply via email to