On Wed, Mar 31, 2021 at 6:47 AM Jan Hubicka <hubi...@ucw.cz> wrote: > > > > > > > > > Patch is OK now. I was wondering about using avx256 for moves of known > > > > > > Done. X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now. Can > > > you take a look at the patch for Skylake: > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html > > > > I was wondering, if CPU preffers rep movsb when rcx is a compile time > > constant, it probably does some logic at the decode time (i.e. expands > > it into some sequence) and if so, then it may require the code setting > > the register to be near rep (via fusing or simlar mechanism) > > > > Perhaps we want to have fusing pattern for this, so we do not move them > > far apart? > > Reading through the optimization manual it seems that mosvb is fast for > small block no matter if the size is hard wired. In that case you > probably want to check whetehr max_size or expected_size is known to be > small rather than max_size == min_size and both being small. > > But it depends on what CPU really does. > Honza
For small data size, rep movsb is faster only under certain conditions. We can continue fine tuning rep movsb. -- H.J.