On Thu, 24 Nov 2016, Richard Biener wrote:
> On Thu, 24 Nov 2016, Senthil Kumar Selvaraj wrote:
>
> > Hi,
> >
> > I've been analyzing a failing regtest (gcc.dg/strlenopt-8.c) for the avr
> > target. I found that the (dump) failure is because there are 4
> > instances of memcpy, while the testcase expects only 2 for a
> > non-strict align target like the avr.
> >
> > Comparing that with a dump generated by x64_64-pc-linux, I found that
> > the extra memcpy's come from the forwprop pass, when it replaces
> > strcat with strlen and memcpy. For x86_64, the memcpy generated gets
> > folded into a load/store in gimple_fold_builtin_memory_op. That
> > doesn't happen for the avr because len (2) happens to be bigger than
> > MOVE_MAX (1).
> >
> > The avr can only move 1 byte efficiently from reg <-> memory, but it's
> > more efficient to load and store 2 bytes than to call memcpy, so
> > MOVE_MAX_PIECES is set to 2.
> >
> > Given that gimple_fold_builtin_memory_op gets to choose between
> > leaving the memcpy call as is, or breaking it down to a by-pieces
> > move, shouldn't it use MOVE_MAX_PIECES instead of
> > MOV_MAX?
> >
> > That is what the below patch does, and that makes the test
> > pass. Does this sound right?
>
> No, as we handle both memcpy and memmove this way we rely on
> the whole storage fit in a single register so we do the
> right thing for overlapping memory.
So actually your patch doesn't chnage that, the ordering is ensured
by emitting a single GIMPLE load/store pair. There are only
four targets defining MOVE_MAX_PIECES, and one (s390) even has
a smaller MOVE_MAX_PIECES than MOVE_MAX (huh). AVR has larger
MOVE_MAX_PIECES than MOVE_MAX, but that seems to not make much
sense to me given their very similar description plus the
fact that AVR can only load a single byte at a time...
The x86 comment says
/* MOVE_MAX_PIECES is the number of bytes at a time which we can
move efficiently, as opposed to MOVE_MAX which is the maximum
number of bytes we can move with a single instruction.
which doesn't match up with
@defmac MOVE_MAX
The maximum number of bytes that a single instruction can move quickly
between memory and registers or between two memory locations.
@end defmac
note "quickly" here. But OTOH
@defmac MOVE_MAX_PIECES
A C expression used by @code{move_by_pieces} to determine the largest unit
a load or store used to copy memory is. Defaults to @code{MOVE_MAX}.
@end defmac
here the only difference is "copy memory". But we try to special
case the one load - one store case, not generally "copy memory".
So I think MOVE_MAX matches my intent when writing the code.
Richard.
> Richard.
>
> > Regards
> > Senthil
> >
> > Index: gcc/gimple-fold.c
> > ===================================================================
> > --- gcc/gimple-fold.c (revision 242741)
> > +++ gcc/gimple-fold.c (working copy)
> > @@ -703,7 +703,7 @@
> > src_align = get_pointer_alignment (src);
> > dest_align = get_pointer_alignment (dest);
> > if (tree_fits_uhwi_p (len)
> > - && compare_tree_int (len, MOVE_MAX) <= 0
> > + && compare_tree_int (len, MOVE_MAX_PIECES) <= 0
> > /* ??? Don't transform copies from strings with known length this
> > confuses the tree-ssa-strlen.c. This doesn't handle
> > the case in gcc.dg/strlenopt-8.c which is XFAILed for that
> >
> >
>
>
--
Richard Biener <[email protected]>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB
21284 (AG Nuernberg)