Re: Constraints and branching in -fanalyzer

brian.sobulefsky via Gcc Tue, 23 Feb 2021 09:27:15 -0800

I can post it to the patches list with an explanation that it is not a full 
solution
to the PR. I actually put the logic directly into 
constraint_manager::eval_condition
when I was experimenting but I think I will make a new subroutine for you for 
"folding"
constraints. It is not overly complicated, but there are a lot of cases based on
whether the binop_svalue is on the right or left of the eval operator, and
whether the value we recur on is on the left or right in the binop. A new
subroutine can be a place for folding other svalues, such as unaryop or
widening. As of now, for example, if we have a constraint on an svalue s, and 
ask
about the same constraint on a unaryop svalue u_s with operator NOP, there is no
reason to expect the right answer from the constraint manager.




Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, February 22, 2021 5:57 PM, David Malcolm <dmalc...@redhat.com> wrote:

> On Sun, 2021-02-21 at 05:27 +0000, brian.sobulefsky wrote:
>
> > To be clear, I only solved the lesser problem
> > if(idx-- > 0)
> >   __analyzer_eval(idx >= 0);
> > which is a stepping stone problem. I correctly surmised that this was
> > failing
> > (with the prefix operator and -= operator working as expected)
> > because the
> > condition that is constrainted in the postfix problem is the old
> > value for idx
> > while the condition being evaluated is the new value. I can send you
> > a patch,
> > but the short version is the initial value of idx is constrained,
> > then a binop_svalue
> > is stored and eventually ends up in __analyzer_eval. Adding a case in
> > constraint_manager::eval_condition to take apart binop svalues and
> > recur
> > the way you are imagining would be necessary is basically all that is
> > needed
> > to solve that one. Currently, the constraint_manager is just looking
> > at
> > that binop_svalue and determining it does not know any rules for it,
> > because
> > the rule it knows about is actually for one of its arguments.
> > I was hoping this would be it for the original loop problem, but like
> > I said,
> > it goes from saying "UNKNOWN" twice to saying "TRUE UNKNOWN" which I
> > found out happens for the other operators in a for loop as well. The
> > first
> > true is my binop_svalue handler, but the second UNKNOWN is the
> > merging of
> > the bottom of the loop back with the entry point.
> > Since that is fairly abstract, when I found the case I told you
> > about,
> > I decided to see if I could fix it, because merging >0 with =0 into
> >
> > > =0
> > > for a linear CFG should not be too hard.
>
> I think it's probably best if you post the patch that you have so far
> (which as I understand it fixes the non-looping case), since it sounds
> like a useful base to work from.
>
> Thanks
> Dave
>
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > On Saturday, February 20, 2021 12:42 PM, David Malcolm <
> > dmalc...@redhat.com> wrote:
> >
> > > [Moving this discussion from offlist to the GCC mailing list (with
> > > permission) and tweaking the subject]
> > > On Sat, 2021-02-20 at 02:57 +0000, brian.sobulefsky wrote:
> > >
> > > > Yeah, its a lot to take in. For the last one, it was just about
> > > > storing and retrieving data and I ignored everything else about
> > > > the
> > > > analyzer, and that was hard enough.
> > >
> > > Well done on making it this far; I'm impressed that you're diving
> > > into
> > > some of the more difficult aspects of this code, and seem to be
> > > coping.
> > >
> > > > I am working on PR94362, which originates from a false positive
> > > > found
> > > > compiling openssl. It effectivly amounted to not knowing that idx
> > > >
> > > > > =
> > > > > 0 within the loop for(; idx-- >0 ;).
> > > > > It turns out there are two problems here. One has to do with the
> > > > > postfix operator, and yes, the analyzer currently does not know
> > > > > that
> > > > > i >= 0 within an if block like if(idx-- > 0). That problem was
> > > > > easy
> > > > > and I got it to work within a few days with a relatively simple
> > > > > patch. I thought I was going to be submitting again.
> > > > > The second part is hard. It has to do with state merging and has
> > > > > nothing to do with the postfix operator. It fails for all sorts
> > > > > of
> > > > > operators when looping. In fact, the following fails:
> > > > > if(idx < 0)
> > > > >   idx = 0;
> > > > > __analyzer_eval(idx >= 0);
> > > > > which is devastating if you are hoping the analyzer can
> > > > > "understand"
> > > > > a loop. Even with my first fix (which gives one TRUE when run on
> > > > > a
> > > > > for loop), there is the second "iterated" pass, which ends up
> > > > > with a
> > > > > widening_svalue (I'm still trying to wrap my head around that one
> > > > > too), that gives an UNKNOWN
> > >
> > > FWIW "widening" in this context is taken from abstract
> > > interpretation;
> > > see e.g. the early papers by Patrick and Radhia Cousot; the idea is
> > > to
> > > jump ahead of an infinitely descending chain of values to instead
> > > go
> > > directly to a fixed point in a (small) finite number of steps.
> > > (I've
> > > not attempted the narrowing approach, which refines it further to
> > > get a
> > > tighter approximation).
> > >
> > > > So I am trying to follow how states are merged, but that means I
> > > > need
> > > > to at least sort of understand the graphs. I do know that the
> > > > actual
> > > > merging follows in the PK_AFTER_SUPERNODE branch, with the call
> > > > to
> > > > node->on_edge, which eventually gets you to maybe_update_for_edge
> > > > and
> > > > the for_each_fact iterator.
> > >
> > > I have spent far too many hours poring over graph dumps from the
> > > analyzer, and yes, grokking the state merging is painful, and I'm
> > > sure
> > > there are many bugs.
> > > Are you familiar with the various dump formats for the graph? In
> > > particular the .dot ones? FWIW I use xdot.py for viewing them:
> > > https://github.com/jrfonseca/xdot.py
> > > (and indeed am the maintainer of the Fedora package for it); it has
> > > a
> > > relatively quick and scalable UI for navigating graphs, but at some
> > > point even it can't cope.
> > > I started writing a dedicated visualizer that uses some of
> > > xdot.py's
> > > classes:
> > > https://github.com/davidmalcolm/gcc-analyzer-viewer
> > > but it's early days for that.
> > >
> > > > I watched a merge in the debugger yesterday for the if example
> > > > above
> > > > and watched the unknown_svalues be made for the merged state, but
> > > > it
> > > > was still too much to take in all at once for me to know where
> > > > the
> > > > solution is.
> > >
> > > One other nasty problem with the state merging code is that any
> > > time I
> > > touch it, there are knock-on effects where other things break (e.g.
> > > loop analysis stops converging), and as I fix those, yet more
> > > things
> > > break, which is demoralizing (3 steps forward, 2 steps back).
> > > Finding ways to break problems down into smaller chunks seems to be
> > > the
> > > key here.
> > > It sounds like you're making progress with the:
> > > if (idx < 0)
> > > idx = 0;
> > > __analyzer_eval (idx >= 0);
> > > case. Does your fix at least work outside of a loop, without
> > > regressing things? (Or, if it does, what regresses?) If so, then it
> > > could be turned into a minimal testcase and we could at least fix
> > > that.
> > > FWIW I've experimented with better ways to handle loops in the
> > > analyzer. One approach is that GCC already has its own loop
> > > analysis
> > > framework. At the point where -fanalyzer runs, the IR has captured
> > > the
> > > nesting structure of loops in the code, so we might want to make
> > > use of
> > > that in our heuristics. Sadly, as far as I can tell, any attempts
> > > to
> > > go beyond that and reuse GCC's scalar-value-evolution code (for
> > > handling loop bounds and iterations) require us to enable
> > > modification
> > > of the CFGs, which is a no-no for -fanalyzer.
> > > (Loop handling is one of the most important and difficult issues
> > > within
> > > the analyzer implementation. That said, in the last few days I've
> > > been
> > > ignoring it, and have been focusing instead on a rewrite of how we
> > > find
> > > the shortest feasibile path for each diagnostic, since there's
> > > another
> > > cluster of analyzer bugs relating to false negatives in that; I
> > > hope to
> > > land a big fix for feasibilty handling next week - and to finish
> > > reviewing your existing patch (sorry!))
> > > Hope this is helpful. I'm quoting the rest of our exchange below
> > > (for
> > > the mailing list)
> > > Dave
> > >
> > > > Sent with ProtonMail Secure Email.
> > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > > On Friday, February 19, 2021 5:48 PM, David Malcolm <
> > > > dmalc...@redhat.com> wrote:
> > > >
> > > > > On Sat, 2021-02-20 at 00:40 +0000, brian.sobulefsky wrote:
> > > > >
> > > > > > Without asking you to give a lecture on what is effectively
> > > > > > in
> > > > > > the
> > > > > > source, can you give a 30,000 ft. view on your intent between
> > > > > > the
> > > > > > "supers" and the "explodeds" (i.e. supergraph, supernode,
> > > > > > superedge
> > > > > > vs. exploded graph, exploded node, exploded edge). I
> > > > > > understand
> > > > > > that
> > > > > > each are a derived class from your dgraph one directory level
> > > > > > up,
> > > > > > I
> > > > > > mean, what was your design intent.
> > > > >
> > > > > Does:
> > > > > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html
> > > > > answer your question?
> > > > > The main point of the supergraph is to combine all of the CFGs
> > > > > into
> > > > > one
> > > > > big directed graph.
> > > > > The point of the exploded graph is to combine that with
> > > > > dataflow,
> > > > > or,
> > > > > at least, "interesting" aspects of data flow, where
> > > > > "interesting"
> > > > > roughly means "differences in state-machine state".
> > > > > IIRC, both of these are from
> > > > > "Precise Interprocedural Dataflow Analysis via Graph
> > > > > Reachability"
> > > > > (Thomas Reps, Susan Horwitz and Mooly Sagiv),
> > > > > though I'm not using the algorithm they're using, more the
> > > > > high-
> > > > > level
> > > > > ideas of the two above paragraphs.
> > > > > Having an exploded graph means I can generate sequences of
> > > > > events
> > > > > by
> > > > > considering paths through the exploded graph.
> > > > > Does this clarify things?
> > > > > (BTW, yes; I've seen your latest patch; I hope to get to it
> > > > > tomorrow)
> > > > > Dave
> > > > >
> > > > > > Sent with ProtonMail Secure Email.
> > > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > > > > On Tuesday, February 16, 2021 3:10 PM, David Malcolm <
> > > > > > dmalc...@redhat.com> wrote:
> > > > > >
> > > > > > > On Tue, 2021-02-16 at 22:15 +0000, brian.sobulefsky wrote:
> > > > > > >
> > > > > > > > Hi David,
> > > > > > > > I am having a hard time understanding how the
> > > > > > > > constraint_managers
> > > > > > > > update with various types of branching. Usually I can
> > > > > > > > figure
> > > > > > > > things
> > > > > > > > out by following it at a particular memory location (an
> > > > > > > > svalue or
> > > > > > > > region at a particular memory location never changes kind
> > > > > > > > and
> > > > > > > > can
> > > > > > > > be
> > > > > > > > read later and get the value stored there). I am being
> > > > > > > > defeated
> > > > > > > > by
> > > > > > > > the constraint_manager. I will watch two equivalence
> > > > > > > > classes
> > > > > > > > and
> > > > > > > > a
> > > > > > > > constraint be added and then on the next pass see that
> > > > > > > > there
> > > > > > > > are
> > > > > > > > no
> > > > > > > > classes or constraints in that manager. Is there an
> > > > > > > > obvious
> > > > > > > > reason
> > > > > > > > that this would happen?
> > > > > > >
> > > > > > > constraint_manager instances get copied a lot: typically at
> > > > > > > each
> > > > > > > new
> > > > > > > state you're working with a fresh instance that got created
> > > > > > > via
> > > > > > > the
> > > > > > > copy constructor from the previous state.
> > > > > > >
> > > > > > > > FWIW, I was expecting the way it worked to be that when
> > > > > > > > you
> > > > > > > > reach
> > > > > > > > a
> > > > > > > > branch at say, an if or a for loop condition, two
> > > > > > > > managers
> > > > > > > > are
> > > > > > > > created, one for the condition and one for the negation
> > > > > > > > of
> > > > > > > > the
> > > > > > > > condition. Is this be how it works?
> > > > > > >
> > > > > > > Roughly. The constraint_manager instances are "owned" by
> > > > > > > region_model
> > > > > > > instances, themselves within program_state instances.
> > > > > > > At a branch, within exploded_graph::process_node, for
> > > > > > > PK_AFTER_SUPERNODE it iterates through successor
> > > > > > > supernodes,
> > > > > > > creating a
> > > > > > > (next_point, next_state) pair for each outedge, and calls
> > > > > > > exploded_node::on_edge on them.
> > > > > > > This in turn calls program_state::on_edge, which applies
> > > > > > > the
> > > > > > > edge
> > > > > > > flags
> > > > > > > to the program state, and, in particular, the
> > > > > > > constraint_manager;
> > > > > > > either adding constraints, or rejecting the edge as
> > > > > > > infeasible.
> > > > > > > At
> > > > > > > an
> > > > > > > if statement, one edge will have EDGE_FLAG_TRUE, the other
> > > > > > > EDGE_FLAG_FALSE.
> > > > > > > The output of the supergraph .dot dump shows the edge
> > > > > > > flags.
> > > > > > > Hope that makes sense.
> > > > > > > (one drawback of this approach is we can't bifurcate state
> > > > > > > within a
> > > > > > > node, only at edges. Ideally I'd like to eventually fix
> > > > > > > that,
> > > > > > > but
> > > > > > > it
> > > > > > > would be a big reworking)
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Brian
> > > > > > > > Sent with ProtonMail Secure Email.

Re: Constraints and branching in -fanalyzer

Reply via email to