[clang] [analyzer] Avoid use of `CallEvent`s with obsolete state (PR #160707)

Artem Dergachev via cfe-commits Sat, 18 Oct 2025 10:44:22 -0700

https://github.com/haoNoQ commented:


Not really, this is some ancient history even by my standards 😅 But, yes, this 
has bummed me out a few times too.

I think the whole point of the `CallEvent` was to provide user-friendly access 
to path-sensitive aspects of the call such as `CallEvent.getArgSVal()` and 
`CallEvent.getReturnValue()`. But yeah, the way these checker callbacks accept 
the state twice, is absolutely weird. I'm pretty sure there even used to be a 
place where it wasn't true that `CheckerContext.getState() == 
CallEvent.getState()`. (IIRC I fixed one but there could be more places like 
that. I didn't have the guts to add that as an assertion at the invocation 
sites of these callbacks.)

Now, when it comes to the way the state mutates during the callback (such as 
making sure `getReturnValue()`  produces the value you've just bound to the 
call expression), I don't think `CallEvent` can reasonably handle it, even with 
a layer of indirection that keeps the state up to date. Like, _state splits_ 
are a thing. If your `evalCall()` splits the path in two and binds two 
different return values, which one do you return?

>From this perspective, yeah, it may be slightly useful to keep the original 
>pre-call state around. Say, if you're modeling a function `void foo(int *arg)` 
>that effectively does something like `if (*arg == 0) *arg = 1;`, and your 
>checker callback is sufficiently complex, then at some point you may need to 
>ask "Uhh can you please remind me what the original value was?" - and then 
>`CallEvent` comes in helpful. But I don't think this minor benefit outweighs 
>the confusion caused by calling that very specific program state "the state". 
>I think it's much easier to handle that by deliberately remembering the 
>original state in a local variable, and then pass that state around. It's 
>probably not a popular situation in the first place.

(You're probably aware of a significantly more annoying version of the problem: 
the problem of reading the original value of `*arg` from inside 
`checkPostCall`. This comes up in the taint checker a lot. But that's much more 
expensive to provide uniformly.)

So ultimately you'll need to be careful either way. But there's definitely a 
few ways we could eliminate the confusion.

First, yeah, like you said, we could have a static or dynamical typestate that 
indicates whether `getReturnValue()` makes sense. (It only ever makes sense in 
`checkPostCall`.)

Then, it might be a good idea to simply make `CallEvent.getState()` *private*. 
It wasn't our goal to provide convenient access to the entire state. We just 
wanted a nice `getArgSVal()` and such. Maybe let's provide only the 
actually-useful, non-confusing methods? Or we could rename the method to 
something more descriptive, like `getStateBeforeCall()` and 
`getStateAfterCall()` - and make those available/unavailable depending on the 
typestate of `CallEvent`. If we're so worried about the `*arg` situation we may 
also provide a `getSValBeforeCall(const MemRegion *)`.

Alternatively, it may be a good idea to implement these methods in 
`CheckerContext` itself, and then stop showing `CallEvent` to the user. So 
instead of `CallEvent.getArgSVal()` you'd need to type 
`CheckerContext.getCallArgSVal()`. Unfortunately this disconnects the user from 
the entire polymorphism of the `CallEvent` class. There's actually a lot of 
methods that would need to be reimplemented (like 
`AnyCXXConstructorCall.getCXXThisVal()` or `CXXAllocatorCall.getArraySizeVal()) 
and most of them wouldn't make sense most of the time.

Finally, if you're particularly interested in `getReturnValue()` updating in 
sync, it may be possible to do that if you somehow indicate your intent to 
never split the state in your callback. And that's valuable on its own because 
of all the accidental state splits that you can introduce by forgetting to 
chain your `addTransition()`s or `generateNonFatalErrorNode()`s. (The latter is 
particularly brutal because it really doesn't sound like it has anything to do 
with state splitting. Like, I want to emit two non-fatal errors, it's only 
natural that I invoke the generation of two non-fatal error nodes. With fatal 
errors it's actually a perfectly valid thing to do. But if later you decide 
that the errors aren't all that fatal, all of a sudden your entire checker 
breaks down. And good luck noticing that during testing.) So if there was a way 
to indicate the callback's intention explicitly, eg. 
`CheckerContext.setOutgoingNodeLimit(1)` (maybe even set it by default to `1`; 
most checkers need either that or `2`) then we'd avoid all the subtle bugs of 
that nature. Then, naturally, we'd also be able to have a 
`CheckerContext.setReturnValue()` method that's available only in `evalCall()` 
and only when the outgoing node limit is 1. And then you could look up that 
value as much as you want!

(Though, of course, you'd still never want to set the return value more than 
once.)

https://github.com/llvm/llvm-project/pull/160707
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer] Avoid use of `CallEvent`s with obsolete state (PR #160707)

Reply via email to