vsavchenko added a comment.
In D103605#2804421 <https://reviews.llvm.org/D103605#2804421>, @NoQ wrote:
> These are not other topics. We're discussing the overall direction into which
> this patchset is a large step. I absolutely welcome this step and am
> fascinated to see how it turns out but I want us to agree on the overall
> direction first.
OK, fair point, but for further plan, we need to have an incremental model of
how we want to achieve it.
> The reason why I come from this perspective is because that's how I think it
> is //supposed// to work and I'm not aware of a single good reason why it
> can't. I think the reason why it's this way is that the original authors seem
> to have bombarded the code with extra clauses until it starts demonstrating
> the more or less expected behavior, without trying to solve the problem on
> paper first. Like with visitor fixpoint iteration, it sounds like an
> unnecessary piece of complexity that we will be ultimately able to eliminate.
Right now it is all very tightly coupled. It is a lot of moving pieces that I
don't think anyone fully understands. If we take time and decompose existing
tracking into handlers, it can be a step in getting to know some implicit
connections between different tracking activities we have right now.
> Except, well, the distant constraints example a-la @RedDocMD's problem but
> for raw pointers. This is indeed an exceptional case which may cause two
> tracking efforts to run in parallel. This can be probably generalized to any
> constraint-like traits: if we want to explain a constraint, we'll have to be
> prepared that the constraint isn't part of the value track that we already
> have. This is the first time the above assumption was challenged and I
> definitely need time to process this but I still think there shouldn't be any
> other situations like this and there should be something special about this
> situation that will allow us to incorporate it into the linear-track solution.
>
> "At the moment such refactoring is difficult to achieve" is a perfectly
> satisfying answer to me with respect to the current state of things. But I
> want us to agree as early as possible about whether we want to achieve this.
> Simply so that as we make every step, we keep this in mind and evaluate every
> step as taking us either closer to this goal or farther from this goal and
> try to avoid the latter. And I don't think that anything prevents us from
> discussing this goal right now; the problem we're trying to solve isn't
> really ever going to change and we already have a fairly large amount of data.
I get it, but I think we got stuck and should put goal first and think about
what can be done to achieve this. Is it bad that tracking exists and the
analyzer reverse engineers what it did itself? - No doubt! Is note mechanism a
good alternative? - Sure! Is it enough right now? - No. Let's think how we
can get there with incremental changes.
> Ok, do I understand correctly that:
>
> - Functionality-wise, this is entirely equivalent to my NoteTag solution with
> "`isTracked()`";
Not really, you also get the context of tracking: two nodes and options (so you
can figure out if it's being tracked thoroughly or as a condition).
> - Boilerplate-wise it makes the `isTracked()` check implicit ("we were
> invoked at all therefore the value of the expression is tracked by...
> somebody") at the cost of asking the user to define a class;
I suppose we can put it this way.
> - Convenience-wise, it potentially requires the developer to reverse-engineer
> the symbolic execution step when the logic gets more complicated;
I don't see ANY difference with the situation where we use `isTracked`. If you
talk about notes, you can access them here as well. If you don't, then both
ways require the same.
> - Flexibility-wise... I guess you can make custom trackers now, which will
> track the value in a different manner, and communicate with the tracker in
> order to produce different notes?
Exactly, tracker object is the thing that is guaranteed to stay whenever you go
with tracking, and since you can customize it, you can affect every part of the
tracking process.
> Yeah, the store handler part seems perfectly legit because *I* don't see any
> better approaches here :) It's indeed a big problem that we can't customize
> existing notes and this sounds like the right data structure to solve the
> problem.
And we do need to have a central customizable object (aka Tracker) that we pass
around the tracking process to make it happen, right?
> Ok, I don't understand! If we have anyway tracked the value back to
> `b.get()`, why not track further through `b`? What makes `b.get()` the exact
> place at which we want to stop tracking the value? If we wanted to know why
> we choose the true path at the branch, shouldn't we track further through `b`
> as well in order to answer that question? If we didn't want to know why we
> choose the true path, why did we track the value back to `b.get()` in the
> first place?
>
> I guess one possible difference between `a` and `b` here would be in
> messaging, i.e. how we refer to that value.
Bingo!
> For example, we could refer to `a` as "the pointer value" (implying that this
> is the null-dereferenced pointer value) and refer to `b` as "a pointer value"
> (implying that it's some other value, maybe the one that participates in
> condition later). But even then, this distinction is much more coarse-grained
> than "I track the value" vs. "Someone else tracks the value". It's more like
> "Buggy value" vs. "Condition value" which is a distinction we already
> "almost" make with `TrackingKind`. Even if we wanted to identify our values
> as "pointer #1", "pointer #2", etc., we could totally do with a sufficiently
> advanced definition of `isTracked()`.
Correct!
>> Can we move inline defensive checks out of the BugReportVisitors.cpp with
>> `isTracked()` only?
>
> My original thinking was along the lines of, eg.,:
>
> void ExprEngine::processBranch(...) {
> Bldr.addTransition(..., getNoteTag([] (BugReport &BR) {
> if (BR.isTracked(Condition) && !N->getLocationContext().inTopFrame())
> BR.markInvalid();
> }));
> }
>
> And this is how, basically, the entirety of `trackExpressionValue()` could be
> decomposed into individual stateless note tags, with all the necessary state
> stuffed into the bug report. Including the tracking process itself: if note
> tags are sufficient to help with tracking, they're sufficient to do the
> tracking from the start.
Maybe, or maybe not. Maybe it takes more context than we think. You
essentially assume that the only thing that we need to reimplement is one
boolean variable `isTracked` and the expression that is being tracked. I doubt
it.
And it is harder to debug polling approach as opposed to the callback approach.
> This code could also be moved into a checker's `checkBranchCondition()` so
> that to make it truly checker-specific. That would come at the cost of
> creating a lot of redundant exploded nodes just for the sake of attaching the
> tag that will probably never be used so this is rather bad, wouldn't
> recommend.
Why not model defensive checks in the checker, so we don't make undesired
assumptions in the first place?
> I still think the solution that relies on well-defined end of tracking is
> significantly more elegant. Which, again, could be implemented with note
> tags: we could have a note tag that'd call a callback stuffed directly into
> the bug report.
I'm not sure I understood the last thing, can you please elaborate?
> So I think it can work both ways given that the tags solution would rely on
> having a sufficiently flexible mutable state as part of the `BugReport`
> object.
OK, let me re-iterate to get some things straight.
Refactoring of `trackExpressionValue` is good, but we need to think further
(and even further). You have a dream of getting rid of visitors and tracking
altogether so that users don't need to reverse engineer their own checkers
(preferably using tags). As a matter of transformation, you see that tracking
should provide two new concepts: `isTracking` predicate (similar to the
existing interesting-ness mechanism) and the way to detect where the tracking
stopped. You also don't want tracking to be a gigantic tree and have
everything, but to have one source of value without anything else. All other
parts should be decoupled and maybe even given a different name. We still want
to remove all different parts of other checkers from `BugReporterVisitors.cpp`
and probably can do it using just one abstraction of `isTracking` to do that.
And here is my take on things (it is a bit chaotic, sorry about that):
- If we want to move to `isTracking`, we still need a central component that
participates in the tracking or something that is shared through all tracking
logic. In my patches it is a `Tracker` object. It can be done differently by
passing such object directly into every `trackExpressionValue` and its
sub-visitors, but probably something more extensible would be a preference here.
- If we want to split `trackExpressionValue` and extract checker-specific
parts, we need a good and reliable way of splitting it. I used a time-tested
approach of function extraction: if we have `foo(A a, B b, C c) { ... }` with N
independent parts, we can refactor them into `foo(A a, B b, C c) { foo1(a, b,
c); foo2(a, b, c); ... fooN(a, b, c); }`. If one of the `fooK` is big, we can
repeat the same procedure. Instead of doing it plainly, I decided that the user
might be interested in putting their own `fooK` into the mix.
- Can we implement `isTracked` first and then split `trackExpressionValue`? My
answer is no. It implies bigger refactoring because `isTracking` doesn't
provide all the bits of `A a, B b, C c` that the logic used to receive before.
One example here is the `SuppressInlineDefensiveChecksVisitor`. When we add
it, we give it a node where the tracking of the expression was initiated, not
the node with expression itself. We also have a very specific comment there
that this is a requirement and expression own node might be "too late". If you
use `isTracked` in the Visitor you will be automatically in the "too late"
situation. And it is only one example. So, I insist that `isTracked` couldn't
be the initial step for refactoring.
- Can we implement "tracking has stopped mechanism" first and then split
`trackExpressionValue`? My answer is no. Tracking right now is a tree and it
is very convoluted. There is no one single point where it stops, it stops 100
of times. The concept of "stopped" in this situation is not useful (I'm still
not convinced that it's useful at all, but this is a part of a different
conversation). In order to fix that we need to split `trackExpressionValue`.
Contradiction.
- Can we implement `isTracked` afterwards? Yes, we can. Implementation of
`isTracked` on it's own is fairly simple. And we can have users of `isTracked`
and `ExpressionHandler` at the same time. Maybe, once we migrate all uses,
`ExpressionHandler` can be safely retired.
- Can we implement "tracking has stopped mechanism". Probably. This is
something that I doubt in usefulness, and don't really want to plan.
- Is there a way to get rid of tracking? Honestly, I don't think so. I do
believe, though, that it is possible to stop reverse engineering ourselves.
While the analyzer explores different paths, both the core and checkers somehow
affect the state. We can cal such a change of the state as "event". Events
can be both relevant and irrelevant (e.g. happened to a different smart
pointer) to the issue that is found. Many of these events are pivotal to the
user's understanding of the issue and we should explain them with notes. The
way to understand that it is relevant is through tracking. We might want to
keep these traces as we go, but it might be some extra price that we don't want
to pay.
- Let's get back to the events! Can we build our entire solution around them?
I do think so, actually. I think that the analyzer can produce a bunch of
events (maybe implemented as note tags). Then each bug report can provide a
visitor, not exploded graph visitor as it is now, but event visitor. It will
be automatically called for relevant events only (with tracking hidden under
the hood). Many events will have reasonable defaults so you don't need to
write your own implementations every time (e.g. "a is assigned with b here"),
but you can always override them. This is something that I started with
`StoreHandler` because "store" is one of those events. It is different though
from the perfect vision because we don't want to reverse engineer what the
engine/the checker did before, perfectly we would have this information in the
node (or on the edge) that "store event happened". Tracker will be very
trivial in this plan.
- Is it possible to be done incrementally? I do think so. We can extract
different events from `trackExpressionValue` and change them into various
`EventHandler`s (like `StoreHandler`). Probably, in the process, we will
understand what kind of data describes each type of events. Another step is
pretty hard, we need to design how we are going to generate (easier) and store
(harder) events during analysis. It should be pretty implicit, so it has a lot
of sensible defaults produced by the core. And checker developers should be
able to generate their own events (maybe even for other checkers, if they model
something that everyone can benefit from). I think it will be possible to
generate one event type at a time, and change how we handle it in the tracker,
i.e. move from reverse engineering to detecting relevant events in the graph.
Once we are finished with this, we can extract a separate thing called
`EventVisitor` and migrate a big chunk of `Tracker` and `EventHandler`
interfaces there.
- Why do we even need to introduce tracker and event handlers if we are going
to hide/delete them in this plan? The trick is about making this whole plan
manageable. One enormous refactoring is hard to plan and test. Big work can
demotivate and drain if you don't see the end of it. I believe that we can
make great things if we take a lot of baby steps.
Phew, I hope this answers it.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D103605/new/
https://reviews.llvm.org/D103605
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits