[PATCH] D69560: [clang-tidy] Add 'experimental-cppcoreguidelines-avoid-adjacent-parameters-of-the-same-type' check

Aaron Ballman via Phabricator via cfe-commits Thu, 08 Oct 2020 06:59:16 -0700

aaron.ballman added a comment.

In D69560#2292505 <https://reviews.llvm.org/D69560#2292505>, @whisperity wrote:


> I'd like to resurrect the discussion about this. Unfortunately, the concept 
> of `experimental-` group (in D76545 <https://reviews.llvm.org/D76545>) has 
> fallen silent, too. We will present the results of this analysis soon at IEEE 
> SCAM (Source Code Analysis and Manipulation) 2020 
> <http://ieee-scam.org/2020>. While a previous submission about this topic was 
> rejected on the grounds of not being in line with the conference's usual, 
> hyper-formalised nature, with one reviewer //especially// praising the paper 
> for its nice touch with the empirical world and the fact that neither 
> argument swaps (another checker worked on by a colleague) nor the 
> interactions of this issue with the type system (this checker) was really 
> investigated in the literature for C++ before, we've tailored both the paper 
> and the implementation based on reviewer comments from the scientific world, 
> and the comments here.
> The reviews were mostly favourable, excl. comments about the lack of 
> formalisation and the unfortunate lack of modelling for template 
> instantiations. Such a cold cut, however, //only// introduces false negatives 
> into the result set. Which is fine, as the users will never see those 
> non-existent reports!

Congrats on the SCAM acceptance, I hope the presentation goes well!

> In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman 
> wrote:
>
>> This is a check that is almost impossible to comply with when developing new 
>> code because it is insufficiently clear on what constitutes "related" 
>> parameters.
>
> I agree the CppCG is rather vague about this. From an angle, it seems 
> intentional... "relatedness" is hard to pin down formally, it might vary from 
> one "module" to the next even in the same project. In a subsequent patch to 
> this, I've given a few good examples as to what can be reasonably culled by 
> relatedness heuristics. They filter a good chunk of the results, letting go 
> of most of the trivially "valid (but maybe nonsense) if swapped" functions 
> (min, max, find, etc.)

I agree that the vagueness seems intentional. That's usually reasonable when 
you want coding guidelines for a human to check, but doesn't help for 
production-quality automated checking. Do you happen to know how many false 
positives the check currently issues without that heuristic?

> In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman 
> wrote:
>
>> Two adjacent types in a function parameter list is an incredibly common 
>> situation that arises naturally when writing code [...] and designing around 
>> that from scratch is not always possible.
>
> I see the reasoning behind this. However, this feels more like a motivation 
> //against// the rule itself, and not the checker.

Correct, it's definitely a motivation against the rule itself.

> We can debate the justification for the rule's existence, but (and correct me 
> if I'm wrong) so far I have not seen any tool (that's publicly available, and 
> is for C(++), etc...) that attempts to check your code satisfying this 
> particular rule.

I'm not aware of any either. However, an indictment of the rule being too 
ambiguous is also an indictment of any checkers implementing that rule because 
they're going to have a higher failure rate than if the rule was not ambiguous.

> In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman 
> wrote:
>
>> (especially for constructors, where your parameter order has to match the 
>> declaration order within the class)
>
> CMIIW, but I believe this statement is not true. Aggregate initialisation 
> does not care about your constructors, and yes, takes arguments in the order 
> of fields. However, once you have a user-defined constructor, you should be 
> able to define your parameters in any order you like. The only thing you 
> //should// match is that the //member-init-exprs// of the constructor are 
> evaluated in order of field declarations, not in the order you wrote them.

My point is that it's far more common to match all three orders even though you 
don't have to. e.g.,

  // Common, correct
  class C {
    int a, b;
    float c;
  
  public:
    C(int a_, int b_, float c_) : a(a_), b(b_), c(c_) {}
  };
  
  // Usually held as a code smell, but correct and not commonly diagnosed
  class C {
    int a, b;
    float c;
  
  public:
    C(int a_, float c_, int b_) : a(a_), b(b_), c(c_) {}
  };
  
  // Incorrect, commonly diagnosed
  class C {
    int a, b;
    float c;
  
  public:
    C(int a_, float c_, int b_) : a(a_), c(c_), b(b_) {}
  };



> But trespassing that rule already emits a compiler warning, and in reality, 
> the trespass only causes an issue if there is a data dependency between your 
> fields. You //should// ensure your //member-init-exprs// are in the right 
> order (to guard that a later change introducing a data dependency won't blow 
> you up), but this has zero effect on the order of parameters of the 
> constructor.

I don't fully agree that it has zero effect on the parameter order of the 
constructors, but more importantly, what do you do about this sort of thing as 
a user?

  class Point {
    int x, y;
  
  public:
    Point(int x, int y);
  
    friend bool operator==(const Point &LHS, const Point &RHS);
  }
  
  class Rect {
    int x, y, width, height;
  
  public:
    Rect(int x, int y, int width, int height);
  };

These situations are not ones we should ever be issuing a diagnostic for IMO, 
but I would argue that at least in the case of the constructors, they violate 
the spirit of the C++ guideline rule (where swaps are plausible and 
problematic). This is why I think the *rule* is bogus and I'm not certain we 
should have a checker for it -- the rule itself is too broad and doesn't 
account for realistic code.

> In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman 
> wrote:
>
>> Retroactively applying this rule to an existing code base would be nearly 
>> impossible for a project of any size.
>
> It is hard in the general case (although there are some approaches and 
> foundations that we could build upon with refactoring, I've taken up the 
> mantle of spending some time with this research), but there are cases where 
> "modernisation" efforts, especially tool-aided ones are already possible. I 
> have been recently suggested reading this paper: //H. K. Wright: Incremental 
> Type Migration Using Type Algebra <http://research.google/pubs/pub49386/>// 
> While I agree that the case study in the paper is rather domain-specific if 
> we consider the example there: once a tool has identified a 
> variable/expression to not be a `double` but instead `abseil::Duration`, from 
> a function call where such a refactored argument is passed, you can (or, 
> hopefully, need to) replace the type of the parameter. And one such "adjacent 
> parameters of the same type" has been refactored.
>
> My position with the paper, and this implementation, is that it is more 
> useful for new projects than existing ones.

Agreed.

> It might definitely not be useful //for LLVM// (a humongous project!), and it 
> might not be possible to fix //every// report in an afternoon. But thanks to 
> the vast amount of IDEs integrating Clang-Tidy, if someone wishes to adhere 
> to this rule, they can have a (reasonably) immediate report the moment they 
> trespassed the rule. You can also consider it in an incremental mode (see my 
> comment above with incremental report gating via CodeChecker, but surely it's 
> equally trivial to implement in other pipelines) or only in new or new-ish 
> TUs/modules inside the project.

I understand this point, but it doesn't address the fact that users writing 
real code to solve problems will *often* run into situations where this rule 
diagnoses code that cannot be modified. Overloaded operators are a great 
example of this -- how do you avoid adjacent parameters of the same type in an 
overloaded comparison operator? How does someone writing a `Point` or `Rect` 
class avoid adjacent parameters of the same type in constructors? How does 
someone avoid writing adjacent parameters in a callback function whose 
signature is defined by a standard (like `bsearch()` or `qsort()` from the C 
standard)?

I think the answer to some of these questions is that we can introduce 
heuristics -- exempt overloaded operators from the check, exempt functions 
named `equal`, `compare`, `comp`, etc, exempt constructors (perhaps only when 
all data members have compatible types), etc. But without those sort of 
heuristics, I'm not certain the check has a lot of value for real code.

> In addition, there are plenty of guidelines other than CppCG, which have at 
> least some of their rules mapped to Tidy checks (and surely other analysers' 
> routines). If a project (be it of any size and any domain) does not wish to 
> adhere to a guideline (or a rule of a guideline), they can configure their 
> invocation to not run those checks.

We do have other modules for other coding guidelines like the CERT guidelines, 
etc. And sometimes those guidelines have similar questionable behavior (for 
instance, CERT bans any use of calling the `system()` function -- so users are 
left without recourse there). That may be a reason to adopt this check, but we 
often ask that when check authors find an onerous behavior in a rule, they go 
back to the rule authors to alert them of the problems and see if we can 
improve the rule at the same time as the check. If the rule authors come back 
and say "no, we really do want it to diagnose in those cases", then so be it.

> I agree the fact that Tidy not supporting "opt-in" checkers by default is a 
> bit of a drawback here, however, consider the "execution paths" one project 
> might take, w.r.t. to this analysis:
>
> //The checker is enabled, and it produces between one and a bazillion 
> reports.//
>
> 1. They realise their project does not conform to CppCG (in which case why is 
> the checker group enabled in the first place?) or specifically CppCG#I.24 and 
> decide to disable the check. All is well in the world, and nothing happened 
> excluding one new line being added to a configuration file.
> 2. Multiple projects big and famous enough see that their software are 
> "non-compliant" (at least from the PoV of conformance to CppCG#I.24). They 
> together start to question the necessity of such rule... maybe it shouldn't 
> be a rule in the first place, because it's woefully stupid and impossible to 
> adhere to. This is actionable data one can talk about at conferences and go 
> to the guideline maintainers(?) with, and, if so, the rule gets 
> deprecated/abolished/move to an "intellectual nitpicking" section. (In this 
> case, implementing this check was a nice learning opportunity, and 
> eventually, it will also be removed from Tidy.)
> 3. A //new// projects start developing, and they start seeing this rule. They 
> can also decide whether or not conformance to the rule is something they want 
> to live with. If not, previous cases apply. If so, we start to grow projects 
> that are //somewhat// more type-safe, or less error-prone.
>
> So far, we do not know how painful actual conformance to this rule will be. 
> And, as with most static analysis, we might never know if there ever was "one 
> life saved" by someone deciding to heed this warning. This won't be a panacea 
> overnight, but I still believe it, first, does not have to be, and second, is 
> a step in the right direction.

I agree the check doesn't have to be perfect, but given the triviality of 
finding a large number of false positives, it's worth our effort to see if we 
can find a solution that is more practical.

I think the concrete steps I'd like to see before moving forward with this 
check is:

- Pick 1-3 projects with whatever qualifications make sense (size, adherence to 
the core guidelines, etc) and measure how many diagnostics this check produces 
as-is and see if we can get an idea of what % of diagnostics are ones the user 
would struggle to do something with (where struggle is: can't change the order 
because of <reasons>, changing the order would make the code less readable, 
etc).
- See if we can come up with some heuristics to reduce the number of 
poor-quality diagnostics and measure the effectiveness against those projects.
- Once we have some ideas of ways to reduce the "noise" that we're happy with, 
circle back with the C++ Core Guideline authors to see if they'd be willing to 
tighten the rule up for those situations. If they *don't* want the changes, 
that's fine, we can hide them behind a configuration flag (and we can decide 
whether that flag should be on or off by default at that point).

WDYT?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69560/new/

https://reviews.llvm.org/D69560

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69560: [clang-tidy] Add 'experimental-cppcoreguidelines-avoid-adjacent-parameters-of-the-same-type' check

Reply via email to