[clang] [analyzer] Don't assume third iteration in loops (PR #119388)

Donát Nagy via cfe-commits Fri, 13 Dec 2024 07:14:36 -0800

NagyDonat wrote:

Thanks for the review, I'll fix the minor remarks soon.


> I wonder how you found out that loop widening may conflict with this 
> heuristic, were any tests broken that raised your awareness?

Yes, some (IIRC three) testcases fail in `loop-widening.c`.

> I have one concern with `didEagerlyAssumeBifurcateAt`, that it is bound and 
> overwritten at the most recent eager split - which to my knowledge may not be 
> the loop condition if we have something like this:
> 
> ```c++
> while (coin()) {
>   if (coin()) { // #inner_coin
>     ++num;
>   }
> }
> ```
> 
> Each time the `inner_coin` would spoil the `LastEagerlyAssumeBifurcationAt` 
> trait between the loop iterations, right?
> 
> Storing `const Expr *` pointers is brittle for recursive functions too, where 
> the same expression could appear in different LocationContexts. I wonder if 
> we could make an example where it would confuse this heuristic.
> 
> I'm also not entirely convinced about nested loops - if this heuristic would 
> kick in. This is basically a variant of the `coin` case I showed an example.
> 
> I don't have a counter example just yet, but I'm not sure I can spare the 
> time to craft one.

These situations do not cause problems for my goals, because I only intend to 
use `LastEagerlyAssumeBifurcationAt` as a very short-term storage between the 
node where the eager assumption happens and the `processBranch` call directly 
after it within the same iteration. I did not intend to reuse this information 
in later iterations of the same loop.

However, it's nice that you highlighted this question, because now I realized 
that my implementation is buggy (does not do what I wanted). For example in
```cpp
void foo(int x) {
  while (x == 0) {
    // body that does not change x and does not trigger any EagerlyAssume
  }
}
```
EagerlyAssume would succeed at the beginning of the first iteration, and this 
would set `LastEagerlyAssumeBifurcationAt` to the expression `x == 0`. This 
would stay in the state during the subsequent iterations, and at the beginning 
of the third iteration the code that I added would notice its presence and 
prevent the modeling of the third iteration. (I want to stop continuing in the 
loop after the second iteration only if the loop condition is ambiguous _at the 
beginning of that particular iteration_, not after some earlier iteration.)

To fix this problem, it would be sufficient to e.g. ensure that 
`evalEagerlyAssumeBifurcation` sets `LastEagerlyAssumeBifurcationAt` to 
`nullptr` when the bifurcation fails (= the value is not ambiguous = there is 
only one child node). However, there might be better solutions to implement 
this "did EagerlyAssume split the state _right now_ directly before this 
`processBranch` callback?" check that I need. (E.g. perhaps we could walk a few 
steps backwards on the ExplodedGraph -- but I don't know what kinds of nodes 
should I expect and I'm afraid that what I could write would run into pitfalls 
in unusual cases.)

`<rant>`From a higher level POV I think that `EagerlyAssume` is an ugly hack 
which makes the ExplodedGraph harder to understand or reason about. I'm certain 
that there are (or at least were) some clever corner cases where these eager 
assumptions provide a better analysis (so I don't hope that we can just disable 
it), but I'm still annoyed that it exists.`</rant>` 

https://github.com/llvm/llvm-project/pull/119388
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer] Don't assume third iteration in loops (PR #119388)

Reply via email to