memset inline strategies for Ice Lake

Jan Hubicka Wed, 31 Mar 2021 10:43:36 -0700

> > Reading through the optimization manual it seems that mosvb is fast for
> > small block no matter if the size is hard wired. In that case you
> > probably want to check whetehr max_size or expected_size is known to be
> > small rather than max_size == min_size and both being small.
> >
> > But it depends on what CPU really does.
> > Honza
> 
> For small data size, rep movsb is faster only under certain conditions.   We
> can continue fine tuning rep movsb.


OK, I however wonder why you need condtion maxsize=minsize.
 - If CPU is looking for movl $cst, %rcx than we probably want to be
   sure that it is not moved away fro rep ;movsb by adding fused pattern
 - If rep movsb is slower than loop for very small blocks then you want
   to set lower bound on minsize & expected size, but you do not need 
   to require maxsize=minsize
 - If rep movsb is slower than sequence of moves for small blocks then
   one needs to tweak move by pieces
 - If rep movsb is slower for larger blocks than you want to test
   maxsize and expected size
So in neither of those scenarios testing maxsize=minsize alone makes too
much sense to me... What was the original motivation for differentiating
between precisely known size?

I am mostly curious because it is not that uncomon to have small maxsize
because we are able to track the object size and using short sequence
for those would be nice.

Having minsize non-trivial may not be that uncommon these days either
given that we track value ranges (and under assumption that
memcpy/memset expanders was updated to take these into account).

Honza
> 
> -- 
> H.J.

Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake

Reply via email to