Hi,
I've been analyzing a failing regtest (gcc.dg/strlenopt-8.c) for the avr
target. I found that the (dump) failure is because there are 4
instances of memcpy, while the testcase expects only 2 for a
non-strict align target like the avr.
Comparing that with a dump generated by x64_64-pc-linux, I found that
the extra memcpy's come from the forwprop pass, when it replaces
strcat with strlen and memcpy. For x86_64, the memcpy generated gets
folded into a load/store in gimple_fold_builtin_memory_op. That
doesn't happen for the avr because len (2) happens to be bigger than
MOVE_MAX (1).
The avr can only move 1 byte efficiently from reg <-> memory, but it's
more efficient to load and store 2 bytes than to call memcpy, so
MOVE_MAX_PIECES is set to 2.
Given that gimple_fold_builtin_memory_op gets to choose between
leaving the memcpy call as is, or breaking it down to a by-pieces
move, shouldn't it use MOVE_MAX_PIECES instead of
MOV_MAX?
That is what the below patch does, and that makes the test
pass. Does this sound right?
Regards
Senthil
Index: gcc/gimple-fold.c
===================================================================
--- gcc/gimple-fold.c (revision 242741)
+++ gcc/gimple-fold.c (working copy)
@@ -703,7 +703,7 @@
src_align = get_pointer_alignment (src);
dest_align = get_pointer_alignment (dest);
if (tree_fits_uhwi_p (len)
- && compare_tree_int (len, MOVE_MAX) <= 0
+ && compare_tree_int (len, MOVE_MAX_PIECES) <= 0
/* ??? Don't transform copies from strings with known length this
confuses the tree-ssa-strlen.c. This doesn't handle
the case in gcc.dg/strlenopt-8.c which is XFAILed for that