[PATCH] D130306: [clang][dataflow] Analyze calls to in-TU functions

Gábor Horváth via Phabricator via cfe-commits Fri, 22 Jul 2022 14:34:41 -0700

xazax.hun added a comment.

In D130306#3671852 <https://reviews.llvm.org/D130306#3671852>, @samestep wrote:

> In D130306#3670259 <https://reviews.llvm.org/D130306#3670259>, @xazax.hun 
> wrote:
>
>> 
>
> Yes, we considered a summary-based approach, but we decided not to use it 
> because (as you mentioned) it would be much more difficult to implement, 
> especially for callees with nontrivial CFGs, which would result in a 
> nontrivial flow condition instead of just `Value`s in the `Environment`.

Thanks! I think this is a major design decision, and I'd love to see a 
discussion about the alternatives considered before jumping to an 
implementation. Specifically, I'd like to know if summaries are out of scope or 
something you would consider in the future. Knowing this is useful because this 
can influence how the code should be reviewed. E.g., if you plan to have 
multiple ways to do context sensitive analysis in the future, we should make 
sure that the current code will not lock us in in the inline substitution 
approach. If summaries are not planned at all, this is not a concern.

> Could you elaborate on what specific scalability problems you are concerned 
> about?

The Clang Static Analyzer is using this approach and it was a long way to iron 
out all the corner cases where the performance was bad. It has many cuts 
including maximum number of visits for a basic block, maximum call stack depth, 
not inlining functions after a certain size threshold and so on. Basically, it 
took some time to get the right performance and precision. But overall, the 
inline substitution approach will never scale to large call stacks/long calling 
contexts. On the other hand, a summary-based approach can potentially find bugs 
across a really large number of function calls with reasonable costs. Mainly 
because the same function is not reanalyzed for every context.

> To clarify, the main reason we want context-sensitive analysis is to allow us 
> to simplify our definitions of some models, such as the `optional` checker 
> model. The goal is to provide the analysis a mock implementation of an 
> `optional` type, and then use context-sensitive analysis (probably just one 
> or two layers deep) to model the constructors and methods.

Thanks, this is also useful context!

>> Some other related questions:
>>
>> - Why call noop analysis?
>
> The alternative in this case would be to use the same analysis for the callee 
> that is already being used for the caller? I agree that would be nicer to 
> serve a broader category of use cases. I didn't do that in this patch for a 
> couple reasons:
>
> 1. In the short term, that would require threading more information through 
> to the builtin transfer function, while this patch was meant to just be a 
> minimum viable product.
> 2. In the longer term, we probably don't need that for our specific goals 
> (just modeling simple fields of mock classes) mentioned above.

I see. This was not clear to me from the description of the patch notes. It 
seems to me that the goal here is to simplify the modeling of certain types, 
not general context-sensitive analysis. I reviewed this patch with the wrong 
idea in mind. If that is the goal, the current approach makes sense. But I 
think the comments should make clear what the intended use case and the 
limitations of the current approach is.

>> - Currently, it looks like the framework assumes functions that cannot be 
>> inlined are not doing anything. This is an unsound assumption, and I wonder 
>> if we should change that before we try to settle on the best values for the 
>> tunable parameters.
>
> Agreed, this assumption is unsound. However, the framework already makes many 
> other unsound assumptions in a similar spirit, so this one doesn't 
> immediately strike me as one that needs to change. I'll defer to people with 
> more context.

The main reason I wanted to call this out because it increasingly seems to be 
whenever a decision needs to be made, the framework is getting less and less 
sound. Basically, many design decisions here are very similar to what the Clang 
Static Analyzer is doing. Basically, since Clang already has an analysis 
framework for unsound analysis, I wanted to avoid you reinventing the wheel and 
doing something similar to CSA instead of providing something new for the use 
cases that cannot be covered by the CSA.

>> - I think the environment currently is not really up to context-sensitive 
>> evaluation. Consider a recursive function:
>
> Yes, I mentioned this in the patch description and in a comment in the 
> `pushCall` implementation: we plan to do this by modifying the set of fields 
> in the `DataflowAnalysisContext` class. I plan to do this in my next patch, 
> once this one is landed.

Oh, my bad! I somehow blinked and totally skipped that comment.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130306/new/

https://reviews.llvm.org/D130306

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130306: [clang][dataflow] Analyze calls to in-TU functions

Reply via email to