[PATCH] D53701: [Analyzer] Record and process comparison of symbols instead of iterator positions in interator checkers

Artem Dergachev via Phabricator via cfe-commits Fri, 02 Nov 2018 15:29:34 -0700

NoQ added a comment.

Ok, so what this code does is, eg., for a call of `i1.operator==(i2)` that 
returns a symbol `$x` of type `bool`, it conserves the sacred checker-specific 
knowledge that `$x` is in fact equal to `$offset(i1) == $offset(i2)`, where 
`$offset(i)` is the offset symbol within `IteratorPosition` to which iterator 
`i` is mapped.


This looks like a renaming problem to me: what we really want to do is 
//rename// (i.e. update representation in the `SVal` hierarchy of) `$x` to 
`$offset(i1) == $offset(i2)`. And for now the single plausible approach to 
renaming problems in the Static Analyzer is to avoid them: //give the value a 
correct name from the start// so that you didn't have to rename it later.

What this means is that instead of waiting until `checkPostCall` to obtain `$x` 
and then trying to rename it into `$offset(i1) == $offset(i2)`, we should 
`evalCall` the comparison operator to return `$offset(i1) == $offset(i2)`. So 
that the symbol with the wrong name (i.e., `$x`) didn't appear in the program 
state in the first place.

The good thing about this approach is that it does not require any extra 
tracking at all - no comparison maps, no `evalAssume()`, nothing. The value is 
simply correct from the start.

As a side effect, you will have a chance (though still not forced) to avoid 
redundant invalidation of Store during evaluation of the operator call. This is 
the correct behavior for at least STL containers and probably for all 
containers ever created by mankind. Though of course you never know. I.e., what 
if the code under analysis measures complexity of vector sort procedure and 
increments a global counter every time two iterators are compared within that 
procedure? But at least for STL/boost/LLVM/WebKit containers this is probably 
the right thing to do.

***

Now, of course, `evalCall()` would suppress inlining. During `evalCall()` we 
currently do not know whether the function will be inlined or evaluated 
conservatively if none of the checkers evaluate it, but we can easily provide 
such information in `evalCall()`, so this is not a problem.

The problem with inlining is that we got names for iterator offsets wrong from 
the start, because we conjured them out of thin air and they are conflicting 
with the actual representation of offsets within the implementation of the 
container. Which brings us back to a renaming problem: the problem of renaming 
`$offset(i1) == $offset(i2)` into the actual return value `$x` that was 
computed via inlining. Moreover, this new renaming problem is ill-formed 
because renaming non-atomic symbols doesn't make any sense - we should instead 
rename `$offset(i1)` and `$offset(i2)` separately. Still, the problem is 
indeed, as you already noticed, solved easily when `$x` is concrete: we only 
need to assume that `$offset(i1) == $offset(i2)` or `$offset(i1) != 
$offset(i2)` depending on the concrete value of `$x`.

And if `$x` is symbolic, we could still //introduce a state split// here: on 
one branch both `$x` and `$offset(i1) == $offset(i2)` are false, on the other 
branch they both are true, and no additional tracking is ever be necessary. I 
believe that such state split is not invalid: it pretty much corresponds to the 
"eagerly assume" behavior, just for iterators. The question here is how much 
are we sure that both branches are possible. Even if neither inlining nor our 
custom iterator model managed to refute any of these two branches, one of the 
paths may still be infeasible. But the amount of error here is not larger than 
eagerly-assume, and for eagerly-assume it isn't that bad, so we could try. Of 
course, the alternative to state split is assuming things about `($offset(i1) 
== $offset(i2)) == $x`, but our constraint solver will not be able to handle 
such constraints, which is the very reason why we have problems with renaming 
(well, at least some of them; renaming temporary regions in C++ is slightly 
more complicated than that (understatement intended)). In fact, i think the 
reasoning behind having eager assumptions for numbers is exactly that: they 
wanted to make constraints simpler.

***

But still, i want to step back and ask the question that i really really want 
answered here: //if container methods are inlinable, do we actually want to 
track iterator positions manually?// Maybe just rely on inlining entirely when 
possible? Like, both for modeling and for bug-finding. Or only rely on 
`evalCall()`? Essentially, if inlining is not reliable enough for relying on it 
entirely (so we have to maintain a parallel checker-specific model of iterators 
and have these two models exchange information), why do we think that it is 
reliable enough for the purpose of evaluating iterator comparisons?

This question is in fact asked and answered, one way or another, intentionally 
or accidentally, with different levels of urgency, in every checker that tries 
to model effects of function calls. The most famous example of conscious 
approach to this problem (overstatement intended) so far is 
`RetainCountChecker` that has thousands of lines of code devoted solely to 
figuring out whether a function should be evaluated by the checker or inlined. 
Once the checker decides to rely on inlining while *also* modeling its own 
specific effects - inlining starts conflicting with modeling and the hell 
breaks loose.

So i believe it is very important to ask ourselves here: do we really want to 
//mix// our own symbolic model of iterators with the implicit model of 
iterators as plain structures in the Store that is automatically managed via 
inlining? Because if only we decide to either trust inlining entirely or not 
trust it at all, things become *much* easier. I expect that the amount of 
effort to keep these two models consistent with each other will explode very 
quickly as we add more features to the checker.

Now, the interesting part about keeping these two models consistent in this 
checker is that it is entirely a "constraint-like" problem: our effort consists 
entirely of adding new information about immutable entities (symbols) to the 
state. The state doesn't mutate - it is only being clarified. And when the 
state is actually mutated (eg., when we are modeling `operator++()`), no effort 
to maintain consistency is needed. This is very different from the problem 
we're having in `RetainCountChecker`, where it is a "store-like" problem, i.e. 
function calls we're trying to model are actively mutating the state that might 
have already been mutated within the inlined call, i.e. they're competing for 
the same data. And i believe that the difference is entirely in implementation 
details of these two checkers: there isn't that much difference in kinds of 
bugs they find, so i guess it's an interesting food for thought.

***

So, to summarize:

1. The eager state split solution is not all that bad here, and is also much 
easier to implement than the delayed state split you're trying to implement in 
this patch. Generally, any information produced by inlining is most likely 
pretty accurate and should, in a perfect world, be incorporated into the 
checker's state, given that the checker already decided to track some sort of 
internal information.
2. Still, i want to give you a heads up that the idea of maintaining an 
implicit (via checker traits and `evalCall()`) and an explicit (via Store and 
inlining) models of iterators simultaneously may result in very quick explosion 
in complexity. We already went into this direction because we didn't think 
about it this way, but i think it's never too late to reconsider.



================
Comment at: lib/StaticAnalyzer/Checkers/IteratorChecker.cpp:2066
            "Symbol comparison must be a comparison");
     return assumeNoOverflow(NewState, cast<SymIntExpr>(CompSym)->getLHS(), 2);
   }
----------------
P.S. What was the idea here? Like, `CompSym` was just computed via `BO_EQ` and 
has type of a condition, i.e. `bool` because we are in C++. Is this code trying 
to say that the result of the comparison is bounded by `true/2`?


Repository:
  rC Clang

https://reviews.llvm.org/D53701



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D53701: [Analyzer] Record and process comparison of symbols instead of iterator positions in interator checkers

Reply via email to