[clang] [analyzer][NFC] Introduce framework for checker families (PR #139256)

Balazs Benics via cfe-commits Sun, 18 May 2025 05:01:12 -0700

=?utf-8?q?Donát?= Nagy <donat.n...@ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.n...@ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.n...@ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.n...@ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.n...@ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.n...@ericsson.com>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/139...@github.com>

steakhal wrote:

> For a moment I was very happy to see this solution, but unfortunately XXX is 
> **not** the name of the checker class (= backend = full family), but instead 
> it is just yet another kind of "internal" name that is assigned to each 
> individual checker (= frontend = one checker from the family).
> 
> You're right that for the simple single-part checkers this kind of name 
> usually coincides with the checker class name, but it is still not a single 
> name for the full checker family. For example consider the class 
> `DivZeroChecker` which implements `core.DivideZero` and 
> `optin.taint.TaintedDiv` -- this is introduced in `Checkers.td` as
> 
> ```
> let ParentPackage = Core in {
> //...
> def DivZeroChecker : Checker<"DivideZero">,
>   HelpText<"Check for division by zero">,  
>   Documentation<HasDocumentation>;      
> //...
> }
> let ParentPackage = TaintOptIn in {
> //...
>  def TaintedDivChecker: Checker<"TaintedDiv">,                     
>    HelpText<"Check for divisions where the denominator is tainted "
>             "(attacker controlled) and might be 0.">,              
>    Dependencies<[TaintPropagationChecker]>,                        
>    Documentation<HasDocumentation>;
> //...                           
> }
> ```
> 
> In this particular case using "debug name = internal name of the _first_ 
> checker part" would work (and it is a core checker so we can say that it 
> won't be disabled), but if we have a checker family where all the parts are 
> non-core checkers, then this solution cannot assign a single stable name to 
> the whole family.

To me what is important with this debug name is to be able to easily connect it 
with an implementation file to look at.
Given that many times the checker name matches this debug name, and that is 
also spelled at least partially in the file name (even if  this isn't quite 
true all the cases), its a much better choice than hard-coding something.
This is the reason why I wanted to avoid "magic" prefixing it with 
"family-of-XXX" because at that point we would lose the ability to directly 
grep for this exact string and expect a match. This is a weak argument though 
because at this point we can expect that devs would be able to find the right 
checker even with the family prefix. This is why I don't have hard feelings 
about having this or not having this prefix. I'm good either way.

This debug name should be the XXX part of a registerXXX fn generated for each 
checker (let it be modeling or reporting), and if it's a checker family, then 
the debug name of all the "parts" should be the name of the family. This would 
directly point to what implementation to look at no matter how the checker is 
surfaced to the users, under what aliases or what have we.
In what I proposed as a quick demonstration is the user facing checker name, so 
that's not quite aligned with what I'd want to ideally see.

---

Reflecting on your proposal:
First, let's align on why we currently have the concept of "modeling" and 
"reporting" checkers. I think you are probably already aware of this, but let's 
clarify this.

It's a great property if the exploded graph remains sort of the same no matter 
what checkers are enabled. It makes it easier to reason about it and when and 
why we have sink nodes, affecting what paths explored etc.
In the past (and also still today AFAIK in some cases) checkers checked some 
property and reported a fatal error node. So there was a sink node if the 
checker was enabled, but as soon as it was disabled, the sink node was missing, 
discovering different issues in other checkers down the line in the exploded 
graph.
To solve this, was the effort of splitting the checkers into modeling and 
reporting and emit the sink node regardless. This way only the user-facing 
diagnostics would be different, but the underlying egraph would remain the same 
regardless if we enable or disable a checker. (This statement isn't quite true, 
but you get the principle).

So the problem was that we had to hack together multiple "checker frontends" 
and conditionally report a fatal bug or a sink node on an error.
In theory, a "checker frontend" should only ever depend on a set of "checker 
backends" aka. "modeling checkers".
I wish we had some better way of doing this. I think we are now better than 
ever to achieve this that we tie together the BugType and a checker frontend 
that we can query.

I reread both outlines solutions and both seemed better than what we have 
today. Neither are for free, but the second resonated more with me, and was 
aligned with my expectations. This makes me question why we want to improve the 
ergonomics of defining and organizing checkers? Is this the most important pain 
point we want to focus on right now?

To me btw, this checker families and what we had before with raw bool flags are 
the same thing. It's just now a lot less ugly.
I'll be honest with you, that right now I can't see if the currently more 
appealing direction with option 2 would be really the better path to take (or 
something along that line). So I think it's a risky direction to have so many 
changes on the flight in the subject before consolidating a statusquo first 
with checker families. After that we may have a better view what's next, if we 
still believe that the ergonomics is the best place to invest.

https://github.com/llvm/llvm-project/pull/139256
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer][NFC] Introduce framework for checker families (PR #139256)

Reply via email to