[clang] [analyzer] Avoid use of `CallEvent`s with obsolete state (PR #160707)

Donát Nagy via cfe-commits Sat, 18 Oct 2025 16:20:42 -0700

NagyDonat wrote:

> Thanks for all the valuable feedback!
> 
> > We should separate out `State` from `CallEvent` or have some layer that 
> > would ensure that the State attached with a Call is always up to date.
> 
> @steakhal In an ideal world I agree with this, but I fear that both 
> suggestions have difficulties:
> 
>     * There are lots of methods that rely on the `State` within the 
> `CallEvent` – even seemingly innocent methods like 
> `SimpleFunctionCall::getDecl()` can depend on the state (when a function is 
> called through a pointer). Before checking this I thought that it would be 
> nice to separate out `State` from the `CallEvent`, but based on this I fear 
> that this wouldn't be practical.
> 
>     * Ensuring that the Call always refers to the most recent state is 
> practically impossible, because the "most recent state" is only tracked 
> informally: the developer knows which state variable is the most recent, but 
> this is not represented in a machine-readable way. (In some functional 
> languages there are so called "uniqueness types" or the [`State` 
> monad](https://hackage.haskell.org/package/mtl-2.3.1/docs/Control-Monad-State-Lazy.html)
>  which can express invariants like "always use the most recent state" but C++ 
> doesn't provide a suitable abstraction for this goal.)
> 
> 
> Based on this right now I feel that the best approach would be gradually 
> improving the situation by smaller changes.
> 
> > > I will create a test with a new debug checker which modifies the state in 
> > > its `PreCall` callback and validates that the modification is visible 
> > > through the `CallEvent` objects received by the `EvalCall` and 
> > > `PointerEscape` callbacks. Is this sufficient, or would you prefer a 
> > > different kind of test?
> > 
> > 
> > I think the best way to go about this is a unittest with a custom bug 
> > report visitor that prints a special trait along the path. Like a logger. 
> > And the visitor would print these values alongside the ProgramPoint it is 
> > attached to.
> 
> I don't understand how could a bug reporter visitor help in this situation. 
> The bug (which is fixed by this PR) is that checker callbacks receive 
> `CallEvent` objects whose attached state is old (and differs from the state 
> available through the `CheckerContext` which is also passed to the same 
> callback). Testing this requires two components:
> 
>     * Checker callbacks that validate that the `CallEvent` and the 
> `CheckerContext` have (references to) the same state.
> 
>     * Checker callbacks that modify the state (at suitable points e.g. the 
> PreCall callback) to ensure that states that _may_ be different are actually 
> different. It wouldn't be too difficult to find "real" checkers that cause 
> suitable state changes, but to make the test simple and self-contained I'm 
> leaning towards a "synthetic" state change from a debug checker (the same one 
> that also does the validation).
>       A bug report visitor doesn't help with either of these components, 
> because it cannot observe the state attached to the `CallEvent` (it is just 
> passed to the checker callback, but not persisted in the `ExplodedGraph` for 
> later use by e.g. visitors) and it (obviously) cannot update the state.
> 
> 
> > From this perspective, yeah, it may be slightly useful to keep the original 
> > pre-call state around. Say, if you're modeling a function `void foo(int 
> > *arg)` that effectively does something like `if (*arg == 0) *arg = 1;`, and 
> > your checker callback is sufficiently complex, then at some point you may 
> > need to ask "Uhh can you please remind me what the original value was?" - 
> > and then `CallEvent` comes in helpful. But I don't think this minor benefit 
> > outweighs the confusion caused by calling that very specific program state 
> > "the state". I think it's much easier to handle that by deliberately 
> > remembering the original state in a local variable, and then pass that 
> > state around. It's probably not a popular situation in the first place.
> 
> I didn't even think about this usecase of the "archived" `State` attached to 
> the call event. I agree that this could be useful within checker code, and I 
> also agree that this is probably a rare situation and should be handled by 
> explicitly saving the original state in a local variable. (Note: this "state 
> attached to call becomes obsolete within the checker" situation is distinct 
> from the "checker callback receives a CallEvent with a state that is already 
> obsolete at the beginning of the call" bug which is fixed by this PR.)
> 
> > First, yeah, like you said, we could have a static or dynamical typestate 
> > that indicates whether `getReturnValue()` makes sense. (It only ever makes 
> > sense in `checkPostCall`.)
> 
> I support this idea -- writing a dynamic check would be trivial, and even a 
> static check is not too difficult.
> 
> > Then, it might be a good idea to simply make `CallEvent.getState()` 
> > _private_. It wasn't our goal to provide convenient access to the entire 
> > state. We just wanted a nice `getArgSVal()` and such. Maybe let's provide 
> > only the actually-useful, non-confusing methods?
> 
> Making `CallEvent.getState()` private would be a very easy "harm prevention" 
> change that would reduce the damage potential of this footgun. A quick search 
> reveals that outside of `CallEvent.{cpp,h}` this method is only used twice, 
> and in both of those locations it would be very simple to get the state from 
> the readily available `CheckerContext` instead. (This is very fortunate -- I 
> would have guessed that there is more of this...) In fact I'll soon create a 
> trivial separate PR for this proposal, because it's a trivial change and I 
> don't see any downside.
> 
> Unfortunately this is just "sweeping the problems under the rug" and not a 
> complete solution -- even if we cannot directly access the obsolete old state 
> object, the analyzer may still run into logically wrong behavior if methods 
> like `getArgSVal` produce obsolete results from the old state. There are 
> probably situations where the old state is not _too_ old and the values that 
> are looked up from it happen to be correct (e.g. a PreCall callback can't 
> rebind the value of an expression in the `Environment`), but reasoning about 
> the partial correctness of old states is not a safe long-term solution.
> 
> > Alternatively, it may be a good idea to implement these methods in 
> > `CheckerContext` itself, and then stop showing `CallEvent` to the user. So 
> > instead of `CallEvent.getArgSVal()` you'd need to type 
> > `CheckerContext.getCallArgSVal()`. Unfortunately this disconnects the user 
> > from the entire polymorphism of the `CallEvent` class. There's actually a 
> > lot of methods that would need to be reimplemented (like 
> > `AnyCXXConstructorCall.getCXXThisVal()` or 
> > `CXXAllocatorCall.getArraySizeVal()) and most of them wouldn't make sense 
> > most of the time.
> 
> WDYT about a change that keeps the `CallEvent` type hierarchy but ensures 
> that `CallEvent`s can only be constructed by the `CheckerContext`? I can 
> envision a situation where:
> 
>     * There are "empty"/"stateless"/"template" `CallEvent` objects that are 
> passed around in the engine e.g. in `ExprEngine::evalCall`.
> 
>     * The constructor of `CheckerContext` can (optionally) own a `CallEvent` 
> (of this empty stateless kind).
> 
>     * Checkers can use a new method called `CheckerContext::getCallEvent()` 
> which combines the stateless call event with the state of the 
> `CheckerContext` to return a "regular" `CallEvent`.
> 
> 
> > I think the assertion I propose is the easiest way to test this patch. 
> > I.e., the fact that the newly added assertion used to crash without the 
> > rest of the patch but doesn't crash anymore, is a sufficient defense 
> > against regressions in and of itself.
> 
> @haoNoQ Are you speaking about the assertion to validate that 
> `CheckerContext.getState() == CallEvent.getState()`? I agree that this 
> assertion enforces an invariant that should be held, but unfortunately I 
> don't see a "natural place" for it.
> 
>     * Placing this on the engine side of the engine/checker boundary (just 
> before calling the callback) seems natural, but that's very close to the 
> location where the `CheckerContext` and the (updated) `CallEvent` are created 
> -- so we would just write a tautologically passing check that almost looks 
> like `foo = bar; assert(foo == bar);`.
> 
>     * Placing this assertion into every checker callback (that takes a 
> `CallEvent`) would be too much "pollution" IMO.
> 
>     * Finally, we could place this assertion in a debug checker that is 
> designed for this purpose and activated in a test that ensures that its 
> callbacks are triggered (with frequent state changes to reveal the use of 
> obsolete states). This is essentially equivalent to the testing approach that 
> I suggested.


(I hope I answered everything. If I forgot something, feel free to ask for 
clarification.)

https://github.com/llvm/llvm-project/pull/160707
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer] Avoid use of `CallEvent`s with obsolete state (PR #160707)

Reply via email to