On Tue, May 14, 2019 at 9:21 PM Aaron Sawdey <acsaw...@linux.ibm.com> wrote:
>
> GCC does not currently do inline expansion of overlapping memmove, nor does it
> have an expansion pattern to allow for non-overlapping memcpy, so I plan to 
> add
> patterns and support to implement this in gcc 10 timeframe.
>
> At present memcpy and memmove are kind of entangled. Here's the current state 
> of
> play:
>
> memcpy -> expand with movmem pattern
> memmove (no overlap) -> transform to memcpy -> expand with movmem pattern
> memmove (overlap) -> remains memmove -> glibc call
>
> There are several problems currently. If the memmove() arguments are in fact
> overlapping, then the expansion is actually not used which makes no sense and
> costs performance of calling a library function instead of inline expanding
> memmove() of small blocks.
>
> There is currently no way to have a separate memcpy pattern. I know from
> experience with expansion of memcmp on power that lengths on the order of
> hundreds of bytes are needed before the function call overhead is overcome by
> optimized glibc code. But we need the memcpy guarantee of non-overlapping
> arguments to make that happen, as we don't want to do a runtime overlap test.
>
> There is some analysis that happens in gimple_fold_builtin_memory_op() that
> determines when memmove calls cannot have an overlap between the arguments and
> converts them into memcpy() which is nice.
>
> However in builtins.c expand_builtin_memmove() does not actually do the
> expansion using the memmove pattern. This is why a memmove() call that cannot 
> be
> converted to memcpy() by gimple_fold_builtin_memory_op() is not expanded and 
> we
> call glibc memmove(). Only expand_builtin_memcpy() actually uses the memmove
> pattern.
>
> So here's my proposed set of fixes:
>  * Add new optab entries for nonoverlapping_memcpy and overlapping_memmove
>    cases.
>  * The movmem optab will continue to be treated exactly as it is today so
>    that ports that might have a broken movmem pattern that doesn't actually
>    handle the overlap cases will continue to work.
>  * expand_builtin_memmove() needs to actually do the memmove() expansion.
>  * expand_builtin_memcpy() needs to use cpymem. Currently this happens down in
>    emit_block_move_via_movmem() so some functions might need to be renamed.
>  * ports can then add the new overlapping move and nonoverlapping copy 
> expanders
>    and will get better expansion of both memmove and memcpy functions.
>
> I'd be interested in any comments about pieces of this machinery that need to
> work a certain way, or other related issues that should be addressed in
> between expand_builtin_memcpy() and emit_block_move_via_movmem().

I wonder if introducing a __builtin_memmove_with_hints specifying whether
src < dst or dst > src or unknown and/or a safe block size where that
doesn't matter
would help?  I can then be safely expanded to memmove() or to specific
inline code.

Richard.

> Thanks!
>    Aaron
>
> --
> Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
> 050-2/C113  (507) 253-7520 home: 507/263-0782
> IBM Linux Technology Center - PPC Toolchain
>

Reply via email to