I'm looking at the option to see if I could use it to better exercise
warnings like -Wstringop-overflow and avoid the persistent regressions
on secondary targets like arm that don't inline memory built-ins.  As
an experiment, I thought I'd try to use the option to disable memcpy
inlining altogether as that's what the option is documented to do:

  Override the internal decision heuristic to decide if __builtin_memcpy
  should be inlined and what inline algorithm to use when the expected
  size of the copy operation is known

The manual says that in -mmemcpy-strategy=strategy, the argument is
a comma-separated list of alg:max_size:dest_align triplets with alg
being specified in -mstringop-strategy.  The list of documented algs
is:

  rep_byte
  rep_4byte
  rep_8byte
  byte_loop
  loop
  unrolled_loop
  libcall

The max_size component is said to specify the max byte size with
which inline algorithm alg is allowed, but the dest_align component
is undocumented.

Looking through tests for example I found:

  -mmemcpy-strategy=libcall:-1:noalign
  -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align
  -mmemcpy-strategy=libcall:-1:align
  -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align
  -mmemcpy-strategy=rep_8byte:-1:noalign
  -mmemcpy-strategy=vector_loop:-1:align

The vector_loop alg is not documented, but it looks like dest_align
can be either align or noalign (anything else?)

So with that, I tried -mmemcpy-strategy=libcall:-1:noalign as my
first attempt (see below).  Clearly, that doesn't work.  Without
looking at the code my guess is that the inlining heuristic
the manual talks about have to do with the expansion of memcpy
calls that have not been transformed into MEM_REFs by the middle
end.  Is that right?

If yes, should the manual be updated to make this clear, and to
explain the undocumented components?  (I can put together a patch
if someone can confirm that I understand this right.)

$ cat t.c && gcc -O2 -S -Wall -Wextra -Wpedantic -fdump-tree-optimized=/dev/stdout -mmemcpy-strategy=libcall:-1:noalign t.c

  void f (void *d, const void *s)
  {
    __builtin_memcpy (d, s, 4);
  }

;; Function f (f, funcdef_no=0, decl_uid=1911, cgraph_uid=1, symbol_order=0)

  f (void * d, const void * s)
  {
    unsigned int _3;

    <bb 2> [local count: 1073741824]:
    _3 = MEM[(char * {ref-all})s_2(D)];
    MEM[(char * {ref-all})d_4(D)] = _3;
    return;

Martin

Reply via email to