On Wed, Mar 31, 2021 at 6:47 AM Jan Hubicka <hubi...@ucw.cz> wrote:
>
> > > >
> > > > Patch is OK now.  I was wondering about using avx256 for moves of known
> > >
> > > Done.   X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now.   Can
> > > you take a look at the patch for Skylake:
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html
> >
> > I was wondering, if CPU preffers rep movsb when rcx is a compile time
> > constant, it probably does some logic at the decode time (i.e. expands
> > it into some sequence) and if so, then it may require the code setting
> > the register to be near rep (via fusing or simlar mechanism)
> >
> > Perhaps we want to have fusing pattern for this, so we do not move them
> > far apart?
>
> Reading through the optimization manual it seems that mosvb is fast for
> small block no matter if the size is hard wired. In that case you
> probably want to check whetehr max_size or expected_size is known to be
> small rather than max_size == min_size and both being small.
>
> But it depends on what CPU really does.
> Honza

For small data size, rep movsb is faster only under certain conditions.   We
can continue fine tuning rep movsb.

-- 
H.J.

Reply via email to