https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88809

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-04-16
     Ever confirmed|0                           |1

--- Comment #5 from Martin Liška <marxin at gcc dot gnu.org> ---
I would like working on this problem. I've read the Peters very detail analysis
on Stack overflow and I have first couple of questions and observations I've
did:

1) I would suggest to remove usage of 'rep scasb' at all; even for -Os the
price paid is quite huge
2) I've made strlen instrumentation for -fprofile-{generate,use} and collected
SPEC2016 statistics for train runs:

Benchmark              strlen calls         executed strlen calls     total
executions   avg. strlen
400.perlbench                    102                           39       2358804
            10.97
401.bzip2                          0                            0             0 
403.gcc                          144                           21          4081
              9.3
410.bwaves                         0                            0             0 
416.gamess                         0                            0             0 
429.mcf                            0                            0             0 
433.milc                           3                            0             0 
434.zeusmp                         0                            0             0 
435.gromacs                       86                            7            92
            12.46
436.cactusADM                    110                           46         61788
            10.61
437.leslie3d                       0                            0             0 
444.namd                           0                            0             0 
445.gobmk                         41                            7         75196
             2.01
447.dealII                         3                            0             0 
450.soplex                         8                            6       1161517
            25.59
453.povray                        67                           25         54584
            33.25
454.calculix                      54                            0             0 
456.hmmer                         93                           10            52
             15.1
458.sjeng                          0                            0             0 
459.GemsFDTD                       0                            0             0 
462.libquantum                     0                            0             0 
464.h264ref                       12                            1             1
          14274.0
465.tonto                          0                            0             0 
470.lbm                            0                            0             0 
471.omnetpp                       50                           15      24291732
             9.79
473.astar                          0                            0             0 
481.wrf                           42                           15         20490
             9.41
482.sphinx3                       23                           11        402963
             1.61
483.xalancbmk                     27                            3           160
            13.04

Columns: Benchmark name, # of strlen calls in the benchmarks, # of strlen calls
that were called
during train run, total number of strlen execution, average strlen

Note: 14274.0 value for 464.h264ref is correct:

  content_76 = GetConfigFileContent (filename_53);
  _7 = strlen (content_76);

Based on the numbers an average string for which a strlen is called is quite
short (<32B).

3) The assumption that most strlen arguments have a known 16B alignment is
quite optimistic.
As mentioned, {c,}alloc returns a memory aligned to that, but strlen is most
commonly called
for a generic character pointer for which we can't prove the alignment.

4) Peter's suggested ASM expansion assumes such alignment. I expect a bit more
complex
code for a general alignment situation?

5) strlen call has the advantage then even though being compiled with -O2
-march=x86-64 (a distribution options),
the glibc can use ifunc to dispatch to an optimized implementation

6) The decision code in ix86_expand_strlen looks strange to me:

bool
ix86_expand_strlen (rtx out, rtx src, rtx eoschar, rtx align)
{
  rtx addr, scratch1, scratch2, scratch3, scratch4;

  /* The generic case of strlen expander is long.  Avoid it's
     expanding unless TARGET_INLINE_ALL_STRINGOPS.  */

  if (TARGET_UNROLL_STRLEN && eoschar == const0_rtx && optimize > 1
      && !TARGET_INLINE_ALL_STRINGOPS
      && !optimize_insn_for_size_p ()
      && (!CONST_INT_P (align) || INTVAL (align) < 4))
    return false;

That explains why we generate 'rep scasb' for -O1.

My suggestions:
- I would use strlen call in all situations
- Maybe I would instrument strlen calls in -fprofile-generate/use and if
there's a strlen
call with a very small average size and a known 4B alignment, I would generate
4B loop

Thghouts?

Reply via email to