alexfh added a comment.

In https://reviews.llvm.org/D41077#958819, @szepet wrote:

> 4. FYI: There is a similar check under review which uses only the AST 
> provided information and implemented as a tidy-checker: 
> https://reviews.llvm.org/D20689 (As I see your checker does not uses symbolic 
> execution provided features. So, it would probably worth thinking about if 
> the analyzer is the project where the checker should be implemented. However, 
> @dcoughlin and @alexfh have more insight on the answer to this question. What 
> do you think? )


The main problem with the check proposed in https://reviews.llvm.org/D20689 
(which uses, IIUC, very similar algorithm to this one) is that the false 
positive rate on real projects is too high. The comment I made on the other 
review thread applies here as well:

| There's definitely potential in using parameter name and argument spelling to 
detect possibly swapped arguments, and there's a recent research on this topic, 
where authors claim to have reached the true positive rate of 85% with decent 
recall. See https://research.google.com/pubs/pub46317.html. If you're 
interested in working on this check, I would suggest at least looking at the 
techniques described in that paper. |
|

As for whether this analysis should be implemented in the static analyzer or as 
a clang-tidy check: I think, clang-tidy would be more suitable, since

1. this analysis doesn't need the path analysis machinery provided by the 
static analyzer
2. this analysis may benefit from suggesting automated fixes
3. the false positive rate of the state-of-the-art algorithm implementing 
similar analysis have a false positive rate of at least 15-30%, and a large 
fraction of false positives could  potentially be treated as "code smells" 
which can be fixed by giving function parameters or local variables proper 
names. I don't know whether this kind of analysis meets the bar the static 
analyzer sets for its checkers. Anna or Devin can tell more here.

> 5. In overall, please add comments to the function which describes its 
> purpose. Mainly on heuristic functions, it can help to understand it more 
> easily. Also, could you provide some results of the current heuristics, what 
> are the false positive rates on real projects like LLVM, FFmpeg, etc? I am 
> quite interested in that.

You can take the list of projects from https://reviews.llvm.org/D20689#878803, 
run the analysis on them, and see whether the results are any better.


Repository:
  rC Clang

https://reviews.llvm.org/D41077



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to