https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
d_vampile <d_vampile at 163 dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |d_vampile at 163 dot com --- Comment #48 from d_vampile <d_vampile at 163 dot com> --- (In reply to Jiu Fu Guo from comment #41) > (In reply to Wilco from comment #40) > > (In reply to Jiu Fu Guo from comment #39) > > > I’m thinking to draft a patch for this optimization. If any suggestions, > > > please point out, thanks. > > > > Which optimization to be precise? Besides unrolling I haven't seen a > > proposal for an optimization which is both safe and generally applicable. > > 1. For unroll, there are still branches in the loop. And then need careful > merge on those reading and comparison. Another thing about unroll would be > that, if we prefer to optimize this early in GIMPLE, we still not GIMPLE > unroll on it. > while (len != max) > { > if (p[len] != cur[len]) > break; ++len; > if (p[len] != cur[len]) > break; ++len; > if (p[len] != cur[len]) > break; ++len; > .... > } > > 2. Also thinking about if it makes sense to enhance GIMPLE vectorization > pass. In an aspect that using a vector to read and compare, also need to > handle/merge compares into vector compare and handle early exit carefully. > if (len + 8 < max && buffers not cross page) ///(p&4K) == (p+8)&4k? > 4k:pagesize > while (len != max) > { > vec a = xx p; > vec b = xx cur; > if (a != b) /// may not only for comparison > {....;break;} > len += 8; > } > > 3. Introduce a new stand-alone pass to optimize reading/computing shorter > types into large(dword/vector) reading/computing. > > Thanks a lot for your comments/suggestions! Any progress or patches for the new pass mentioned in point 3? Or new ideas?