On Tue, May 14, 2019 at 9:21 PM Aaron Sawdey <acsaw...@linux.ibm.com> wrote: > > GCC does not currently do inline expansion of overlapping memmove, nor does it > have an expansion pattern to allow for non-overlapping memcpy, so I plan to > add > patterns and support to implement this in gcc 10 timeframe. > > At present memcpy and memmove are kind of entangled. Here's the current state > of > play: > > memcpy -> expand with movmem pattern > memmove (no overlap) -> transform to memcpy -> expand with movmem pattern > memmove (overlap) -> remains memmove -> glibc call > > There are several problems currently. If the memmove() arguments are in fact > overlapping, then the expansion is actually not used which makes no sense and > costs performance of calling a library function instead of inline expanding > memmove() of small blocks. > > There is currently no way to have a separate memcpy pattern. I know from > experience with expansion of memcmp on power that lengths on the order of > hundreds of bytes are needed before the function call overhead is overcome by > optimized glibc code. But we need the memcpy guarantee of non-overlapping > arguments to make that happen, as we don't want to do a runtime overlap test. > > There is some analysis that happens in gimple_fold_builtin_memory_op() that > determines when memmove calls cannot have an overlap between the arguments and > converts them into memcpy() which is nice. > > However in builtins.c expand_builtin_memmove() does not actually do the > expansion using the memmove pattern. This is why a memmove() call that cannot > be > converted to memcpy() by gimple_fold_builtin_memory_op() is not expanded and > we > call glibc memmove(). Only expand_builtin_memcpy() actually uses the memmove > pattern. > > So here's my proposed set of fixes: > * Add new optab entries for nonoverlapping_memcpy and overlapping_memmove > cases. > * The movmem optab will continue to be treated exactly as it is today so > that ports that might have a broken movmem pattern that doesn't actually > handle the overlap cases will continue to work. > * expand_builtin_memmove() needs to actually do the memmove() expansion. > * expand_builtin_memcpy() needs to use cpymem. Currently this happens down in > emit_block_move_via_movmem() so some functions might need to be renamed. > * ports can then add the new overlapping move and nonoverlapping copy > expanders > and will get better expansion of both memmove and memcpy functions. > > I'd be interested in any comments about pieces of this machinery that need to > work a certain way, or other related issues that should be addressed in > between expand_builtin_memcpy() and emit_block_move_via_movmem().
I wonder if introducing a __builtin_memmove_with_hints specifying whether src < dst or dst > src or unknown and/or a safe block size where that doesn't matter would help? I can then be safely expanded to memmove() or to specific inline code. Richard. > Thanks! > Aaron > > -- > Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com > 050-2/C113 (507) 253-7520 home: 507/263-0782 > IBM Linux Technology Center - PPC Toolchain >