On Tue, Jul 2, 2019 at 3:48 AM Martin Sebor <mse...@gmail.com> wrote:
>
> Attached is a more complete solution that fully resolves the bug
> report by avoiding a warning in cases like:
>
>    char a[32], b[8];
>
>    void f (void)
>    {
>      if (strlen (a) < sizeof b - 2)
>        snprintf (b, sizeof b, "b=%s", a); // no -Wformat-truncation
>    }
>
> It does that by having get_range_strlen_dynamic use the EVRP
> information for non-constant strlen calls: EVRP has recorded
> that the result is less sizeof b - 2 and so the function
> returns this limited range of lengths to snprintf which can
> then avoid warning.  It also improves the checking and can
> find latent bugs it missed before (like the off-by-one in
> print-rtl.c).
>
> Besides improving the accuracy of the -Wformat-overflow and
> truncation warnings this can also result in better code.
> So far this only benefits snprintf but there may be other
> opportunities to string functions as well (e.g., strcmp or
> memcmp).
>
> Jeff, I looked into your question/suggestion for me last week
> when we spoke, to introduce some sort of a recursion limit for
> get_range_strlen_dynamic.  It's easily doable but before we go
> down that path I did some testing to see how bad it can get and
> to compare it with get_range_strlen.  Here are the results for
> a few packages.  The dept is the maximum recursion depth, success
> and fail are the numbers of successful and failed calls to
> the function:
>
>    binutils-gdb:
>                                depth   success     fail
>      get_range_strlen:           319      8302    21621
>      get_range_strlen_dynamic:    41      1503      161
>    gcc:
>      get_range_strlen:            46      7211    11365
>      get_range_strlen_dynamic:    23     10272       12
>    glibc:
>      get_range_strlen:            76      2840    11422
>      get_range_strlen_dynamic:    51      1186       46
>    elfutils:
>      get_range_strlen:            33      1198     2516
>      get_range_strlen_dynamic:    31       685       36
>    kernel:
>      get_range_strlen:            25      5299    11698
>      get_range_strlen_dynamic:    31      9911      122
>
> Except the first case of get_range_strlen (I haven't looked into
> it yet), it doesn't seem too bad, and with just one exception it's
> better than get_range_strlen.  Let me know if you think it's worth
> adding a parameter (I assume we'd use it for both functions) and
> what to set it to.

I think we want to avoid adding code to GCC that might turn out
quadratic for artificial testcases. History tells us that eventually
somebody will find a real-world testcase that hits it.  I would
suggest to not place a limit on the depth but on the number
of SSA_NAME_DEF_STMTs visited.  Not sure to what this
would turn out but I guess using a limit around 500 would work?
For the data above what's the biggest depth / highest number
of stmts we visit when we are able to compute a useful limit
and what is it when including failure?  The individual number
of successes/fails are not too interesting.  Maybe even provide
a histogram for the successful cases depth/stmt count?

Otherwise I like the merging - can you annotate the passes.def
entries with a comment indicating what the pass parameter
designates?  Like /* do optimize */?

Since we have a parallelization GSoC project running not
adding more global state to passes would be nice as well,
strlen has a lot of it that could be in a per invocation state
class.  Not a requirement for this patch though.

Besides the limiting no further comments from me - I'll leave
the details to Jeff.

Thanks a lot for doing this!
Richard.

>
> On 6/11/19 5:26 PM, Martin Sebor wrote:
> > The sprintf and strlen passes both work with strings but
> > run independently of one another and don't share state.  As
> > a result, lengths of strings dynamically created by functions
> > that are available to the strlen pass are not available to
> > sprintf.  Conversely, lengths of strings formatted by
> > the sprintf functions are not made available to the strlen
> > pass.  The result is less efficient code, poor diagnostics,
> > and ultimately less than optimal user experience.
> >
> > The attached patch is the first step toward rectifying this
> > design problem.  It integrates the two passes into one and
> > exposes the string length data managed by the strlen pass to
> > the sprintf "module."  (It does not expose any sprintf data
> > to the strlen pass yet.)
> >
> > The sprintf pass invocations in passes.def have been replaced
> > with those of strlen.  The first "early" invocation is only
> > effective for the sprintf module to enable warnings without
> > optimization.  The second invocation is "late" and enables
> > both warnings and the sprintf and strlen optimizations unless
> > explicitly disabled via -fno-optimize-strlen.
> >
> > Since the strlen optimization is the second invocation of
> > the pass tests that scan the strlen dump have been adjusted
> > to look for the "strlen2" dump file.
> >
> > The changes in the patch are mostly mechanical.  The one new
> > "feature" worth calling out is the get_range_strlen_dynamic
> > function.  It's analogous to get_range_strlen in gimple-fold.c
> > except that it makes use of the strlen "dynamically" obtained
> > string length info.  In cases when the info is not available
> > the former calls the latter.
> >
> > The other new functions in tree-ssa-strlen.c are
> > check_and_optimize_call and handle_integral_assign: I added
> > them only to keep the check_and_optimize_stmt function from
> > getting excessively long and hard to follow.  Otherwise,
> > the code in the functions is unchanged.
> >
> > There are a number of further enhancements to consider as
> > the next steps:
> >   *  enhance the strlen module to determine string length range
> >      information from integer variable to which it was previously
> >      assigned (necessary to fully resolve pr83431 and pr90625),
> >   *  make the sprintf/snprintf string length results directly
> >      available to the strlen module,
> >   *  enhance the sprintf module to optimize snprintf(0, 0, fmt)
> >      calls with fmt strings consisting solely of plain characters
> >      and %s directives to series of strlen calls on the arguments,
> >   *  and more.
> >
> > Martin
>

Reply via email to