whisperity added a comment.

@aaron.ballman I've gone over LLVM (and a few other projects). Some general 
observations:

- Length of `2` **is vile**. I understand that the C++CG rule says even lengths 
of 2 should be matched, but that is industrially infeasible unless one 
introduces such a rule incrementally to their project. Findings of length 2 are 
**, in general,** an order of magnitude more than... basically the rest of the 
findings.
  - On big projects, even the current "default" of "length `3`" seems to be too 
low. In reality, one should consider (this is likely to be out of scope for 
Tidy) how often these functions are called, and various other metrics on how 
serious an offender is.
  - LLVM: **7206** len-2 findings, 767 len-3, 194 len-4, 59 len-5, etc.)
  - OpenCV: **4391** len-2, 987 len-3, 507, 115, ...
    - I did not manually evaluate OpenCV, and I think OpenCV is a project on 
which - due to their seeming design principle of type erasure - this check is 
useless.
    - (Similar could be said about LLVM, we are cutting a lot of corners on 
"niceness" for absolute bleeding-edge performance!)
- Most of the findings come from low-level types, such as `bool` or `int` or 
`unsigned` or some project or library-specific (such as a few cases of `SDVal` 
in LLVM) types.
- Strings and string-like types (such as `StringRef`, `char *`, etc.) are also 
huge offenders.
  - Some of these offenders can be side-stepped with semantic typedefs (D66148 
<https://reviews.llvm.org/D66148>)
- Further heuristics are definitely needed, but this patch is huge enough as 
is, I'd introduce further heuristics in future patches:
  - Allow the user to configure the set of ignored types and variable names 
from the command-line (I think this already exists as a //FIXME// in the code 
as of now)
  - Relatedness check for parameter names (more or less like in D20689 
<https://reviews.llvm.org/D20689>!) //if// we can fetch parameter names -- I am 
thinking about trying to parse the variable names to find if they have a common 
prefix and number.
  - If the parameters have a //sufficient// (question is, what is sufficient) 
common ancestor in the AST (like as @zporky mentioned, assignment, comparison, 
etc., but also should consider passing them to the same function -- this would 
ignore forwarding and factory methods nicely on the first run --), they should 
be silenced.
    - Forwarding and trampoline functions I've put into the //False-positive// 
category, as they will be ignored once such a heuristic is complete.
    - This contributed a huge chunk of the false-positive ratio. Due to the 
trampolines and factories, and "forwarding overloads", it's around 40% on 
Xerces and LLVM. Without it, around 15%.
      - This more or less follows naturally, one offending function is one true 
positive, but every trampoline, pimple, overload, etc. on it is basically an 
additional false positive.
- Making decisions on code you're not familiar with is hard ;)

This is most likely not possible purely from Tidy as this requires the wrangler 
tooling support, but I believe people who wish to use this check should be 
encouraged to **only** take the introduced differences 
<http://codechecker.readthedocs.io/en/latest/jenkins_gerrit_integration/#execute-shell>
 into consideration on their project.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69560/new/

https://reviews.llvm.org/D69560



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to