https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88809
Martin Liška <marxin at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2019-04-16
Ever confirmed|0 |1
--- Comment #5 from Martin Liška <marxin at gcc dot gnu.org> ---
I would like working on this problem. I've read the Peters very detail analysis
on Stack overflow and I have first couple of questions and observations I've
did:
1) I would suggest to remove usage of 'rep scasb' at all; even for -Os the
price paid is quite huge
2) I've made strlen instrumentation for -fprofile-{generate,use} and collected
SPEC2016 statistics for train runs:
Benchmark strlen calls executed strlen calls total
executions avg. strlen
400.perlbench 102 39 2358804
10.97
401.bzip2 0 0 0
403.gcc 144 21 4081
9.3
410.bwaves 0 0 0
416.gamess 0 0 0
429.mcf 0 0 0
433.milc 3 0 0
434.zeusmp 0 0 0
435.gromacs 86 7 92
12.46
436.cactusADM 110 46 61788
10.61
437.leslie3d 0 0 0
444.namd 0 0 0
445.gobmk 41 7 75196
2.01
447.dealII 3 0 0
450.soplex 8 6 1161517
25.59
453.povray 67 25 54584
33.25
454.calculix 54 0 0
456.hmmer 93 10 52
15.1
458.sjeng 0 0 0
459.GemsFDTD 0 0 0
462.libquantum 0 0 0
464.h264ref 12 1 1
14274.0
465.tonto 0 0 0
470.lbm 0 0 0
471.omnetpp 50 15 24291732
9.79
473.astar 0 0 0
481.wrf 42 15 20490
9.41
482.sphinx3 23 11 402963
1.61
483.xalancbmk 27 3 160
13.04
Columns: Benchmark name, # of strlen calls in the benchmarks, # of strlen calls
that were called
during train run, total number of strlen execution, average strlen
Note: 14274.0 value for 464.h264ref is correct:
content_76 = GetConfigFileContent (filename_53);
_7 = strlen (content_76);
Based on the numbers an average string for which a strlen is called is quite
short (<32B).
3) The assumption that most strlen arguments have a known 16B alignment is
quite optimistic.
As mentioned, {c,}alloc returns a memory aligned to that, but strlen is most
commonly called
for a generic character pointer for which we can't prove the alignment.
4) Peter's suggested ASM expansion assumes such alignment. I expect a bit more
complex
code for a general alignment situation?
5) strlen call has the advantage then even though being compiled with -O2
-march=x86-64 (a distribution options),
the glibc can use ifunc to dispatch to an optimized implementation
6) The decision code in ix86_expand_strlen looks strange to me:
bool
ix86_expand_strlen (rtx out, rtx src, rtx eoschar, rtx align)
{
rtx addr, scratch1, scratch2, scratch3, scratch4;
/* The generic case of strlen expander is long. Avoid it's
expanding unless TARGET_INLINE_ALL_STRINGOPS. */
if (TARGET_UNROLL_STRLEN && eoschar == const0_rtx && optimize > 1
&& !TARGET_INLINE_ALL_STRINGOPS
&& !optimize_insn_for_size_p ()
&& (!CONST_INT_P (align) || INTVAL (align) < 4))
return false;
That explains why we generate 'rep scasb' for -O1.
My suggestions:
- I would use strlen call in all situations
- Maybe I would instrument strlen calls in -fprofile-generate/use and if
there's a strlen
call with a very small average size and a known 4B alignment, I would generate
4B loop
Thghouts?