[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

Balazs Benics via cfe-commits Fri, 18 Oct 2024 05:41:59 -0700

=?utf-8?q?Donát?= Nagy <donat.n...@ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.n...@ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.n...@ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.n...@ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.n...@ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.n...@ericsson.com>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/109...@github.com>

steakhal wrote:

Thank you for sharing your roadmap.
I didn't remember there was consensus on a direction of dropping execution 
paths in the related 
[RFC](https://discourse.llvm.org/t/loop-handling-improvement-plans/80417/13).
Nevertheless the direction seems interesting, but without consensus I'm not 
sure we should pursue that in upstream LLVM.

> (As I said earlier, I'm strongly opposed to a solution where the 
> visitor-based suppression is activated by each checker [or many checkers] 
> because then we'd be discarding a significant percentage (perhaps even the 
> majority) of work done by the analyzer.)

My problem with this approach is still that it's intrusive. We can only cut 
those branches once all checkers agree on suppressing all the reports coming 
from those branches. And it's not only about upstream checkers. There could be 
downstream checkers, that for example could do much more with the ExplodedGraph 
than just reporting issues.

One particular (slightly broken) checker that crosses my mind in this case is 
the `UnreachableCodeChecker` which checks the `Eng.hasWorkRemaining()` in the 
`checkEndAnalysis` callback to see if we explored all possible branches in the 
top-level function we just analyzed. If so, it checks what BasicBlocks in the 
CFG were left unvisited to diagnose unreachable code. I don't imply that this 
particular checker is of high quality or used in production (AFAIK); all I'm 
saying is that these make me skeptical of not exploring these branches. This 
makes the argument of future-proofing weaker to me.

> Edit: the very recent bug report 
> https://github.com/llvm/llvm-project/issues/112527 is another example where 
> applying this heuristic to core.UndefinedBinaryOperatorResult would've 
> probably prevented the false positive.

Another way of looking at that issue is, if we would have inlined the 
`std::array::size()` would would have known that the given branch is 
infeasible, like in [this](https://github.com/llvm/llvm-project/pull/109804) 
older case. In any case, I wouldn't draw far arching conclusions.

> I don't think that discarding the execution paths with these "weak" 
> assumptions would be a really big change:
> * It only discards paths, so it cannot introduce any false positives (unlike 
> the loop widening plans, which performs invalidation).
> * I don't think that there is a systemic class of bugs that are only found on 
> these "weak assumption execution path" branches and can be distinguished 
> algorithmically from the highly unwanted results.

I think different tool vendors may want to target different FP TP rates. This 
argument only holds if we assume it's fine to drop TPs in favor of droping FPs. 
To me, it always depends. As a general guideline, I agree with your sentiment 
though. With mass changes like this, I'm not sure anymore without empirical 
evidence.

> > [...] and with options for disabling it while developing and preparing this 
> > new version.
> 
> Obviously I can put the freshly developed stuff behind analyzer options if 
> there's demand for it. In the case of this `ArrayBoundV2` change I did not do 
> so, because `ArrayBoundV2` is an alpha checker and it's very noisy without 
> this patch, but if I extend this to stable checkers I'll probably introduce 
> an option.

The `ArrayBoundV2` changes make sense, exactly because it's an `alpha` checker 
like you said. I'm not challenging that.

> > > @steakhal @ anybody else Is my eager implementation acceptable for you 
> > > (in theory, assuming that it's cleaned up sufficiently)? If yes, I'll 
> > > start to work on implementing the suggestions of @isuckatcs and any other 
> > > observations.
> > 
> > 
> > I don't think we reached consensus yet.
> 
> I see. What can I do to approach consensus? Is there some aspect of either of 
> the implementations that we should discuss in more detail? Or should we ask 
> other contributors?

I had a slight discomfort of implementing something in the Engine that could be 
done in higher-level, like in a BR visitor.
What you said about dropping execution paths makes sense and would potentially 
bring a huge improvement in multiple aspects such as TP FP rate and even 
analysis time. Making research and experimentation tackling these issues is 
really important. The perfect is the enemy of good; so I'm fine now with doing 
this in the Engine.

Although, I still have major discomfort of extending this beyond OOBv2, and 
skipping analyzing execution paths we do today - like you shared in your plans. 
To convince me, I'd need evidence of the net benefit without any major blocking 
issues (in other words loosing truly valuable TPs for example, or if it would 
turn out that some internal components couldn't be upgraded to the new loop 
modeling model). This is a lot more difficult, and may not even be practical to 
do incrementally.
The llvm policies suggest that feature-branches are discouraged; but they don't 
formulate opinions on forks.
Maybe you would have more flexibility for reaching a stable version in a fork 
than directly developing this upstream.
This way once the prototype is complete, or reaches a maturity where we could 
gather data for supporting the claims that this is the way how loops should be 
handled.
Once this is formulated, the maintainers could discuss and see if they agree 
with these claims and then accepting the patches.

I think the more experienced Static Analyzer devs participate in a discussion, 
the better.
I'd definitely expect the code owners to express their opinion, and reach 
unanimous consensus before accepting patches for changing how loops are modeled 
in general.

Given the title, summary and the content of this PR, I'm convinced that this 
isn't the right forum though.
This should be done in the LLVM discourse.

https://github.com/llvm/llvm-project/pull/109804
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

Reply via email to