[PATCH] D135472: [ODRHash] Hash attributes on declarations.

Chuanqi Xu via Phabricator via cfe-commits Fri, 14 Oct 2022 09:27:04 -0700

ChuanqiXu added inline comments.


================
Comment at: clang/lib/AST/ODRHash.cpp:479-480
+
+  llvm::copy_if(D->attrs(), std::back_inserter(HashableAttrs),
+                [](const Attr *A) { return !A->isImplicit(); });
+}
----------------
aaron.ballman wrote:
> ChuanqiXu wrote:
> > aaron.ballman wrote:
> > > ChuanqiXu wrote:
> > > > aaron.ballman wrote:
> > > > > ChuanqiXu wrote:
> > > > > > vsapsai wrote:
> > > > > > > vsapsai wrote:
> > > > > > > > erichkeane wrote:
> > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > ChuanqiXu wrote:
> > > > > > > > > > > vsapsai wrote:
> > > > > > > > > > > > I'm not sure `isImplicit` is the best indicator of 
> > > > > > > > > > > > attributes to check, so suggestions in this area are 
> > > > > > > > > > > > welcome. I think we can start strict and relax some of 
> > > > > > > > > > > > the checks if needed.
> > > > > > > > > > > > 
> > > > > > > > > > > > If people have strong opinions that some attributes 
> > > > > > > > > > > > shouldn't be ignored, we can add them to the tests to 
> > > > > > > > > > > > avoid regressions. Personally, I believe that alignment 
> > > > > > > > > > > > and packed attributes should never be silently ignored.
> > > > > > > > > > > Agreed. I feel `isImplicit` is enough for now.
> > > > > > > > > > The tricky part is -- sometimes certain attributes add 
> > > > > > > > > > additional implicit attributes and those implicit 
> > > > > > > > > > attributes matter 
> > > > > > > > > > (https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaDeclAttr.cpp#L9380).
> > > > > > > > > >  And some attributes seem to just do the wrong thing 
> > > > > > > > > > entirely: 
> > > > > > > > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaDeclAttr.cpp#L7344
> > > > > > > > > > 
> > > > > > > > > > So I think `isImplicit()` is a good approximation, but I'm 
> > > > > > > > > > more wondering what the principle is for whether an 
> > > > > > > > > > attribute should or should not be considered part of the 
> > > > > > > > > > ODR hash. Type attributes, attributes that impact object 
> > > > > > > > > > layout, etc all seem like they should almost certainly be 
> > > > > > > > > > part of ODR hashing. Others are a bit more questionable 
> > > > > > > > > > though.
> > > > > > > > > > 
> > > > > > > > > > I think this is something that may need a per-attribute 
> > > > > > > > > > flag in Attr.td so attributes can opt in or out of it 
> > > > > > > > > > because I'm not certain what ODR issues could stem from 
> > > > > > > > > > `[[maybe_unused]]` or `[[deprecated]]` disagreements across 
> > > > > > > > > > module boundaries.
> > > > > > > > > I don't think 'isImplicit' is particularly good.  I think the 
> > > > > > > > > idea of auto-adding 'type' attributes and having the 'rest' 
> > > > > > > > > be analyzed to figure out which are important.
> > > > > > > > > 
> > > > > > > > > Alternatively, I wonder if we're better off just adding ALL 
> > > > > > > > > attributes and seeing what the fallout is. We can later 
> > > > > > > > > decide when we don't want them to be ODR significant (which, 
> > > > > > > > > might be OTHERWISE meaningful later!).
> > > > > > > > One criteria to decide which attributes should be hashed is if 
> > > > > > > > they affect IRGen. But that's not exhaustive and I'm not sure 
> > > > > > > > how practical it is.
> > > > > > > > 
> > > > > > > > The rule I'm trying to follow right now is if declarations with 
> > > > > > > > different attributes can be substituted instead of each other. 
> > > > > > > > For example, structs with different memory layout cannot be 
> > > > > > > > interchanged, so it is reasonable to reject them.
> > > > > > > > 
> > > > > > > > But maybe we should review attributes on case-by-case basis. 
> > > > > > > > For example, for `[[deprecated]]` I think the best for 
> > > > > > > > developers is not to complain about it but merge the attributes 
> > > > > > > > and then have regular diagnostic about using a deprecated 
> > > > > > > > entity.
> > > > > > > One option was to hash `isa<InheritedAttr>` attributes as they 
> > > > > > > are "sticky" and should be more significant, so shouldn't be 
> > > > > > > ignored. But I don't think this argument is particularly 
> > > > > > > convincing.
> > > > > > > 
> > > > > > > At this stage I think we can still afford to add all attributes 
> > > > > > > and exclude some as-needed because modules adoption is still 
> > > > > > > limited. Though I don't have strong feelings about this approach, 
> > > > > > > just getting more restrictive later can be hard.
> > > > > > It looks not a bad idea to add all attributes as experiments, 
> > > > > > @vsapsai how do you feel about this?
> > > > > My intuition is that we're going to want this to be controlled from 
> > > > > Attr.td on a case-by-case basis and to automatically generate the ODR 
> > > > > hash code for attribute arguments.
> > > > > 
> > > > > We can either force every attribute to decide explicitly (this seems 
> > > > > pretty disruptive as a final design but would be a really good way to 
> > > > > ensure we audit all attributes if done as an experiment), or we can 
> > > > > pick a default behavior to ODR hash/not. I suspect we're going to 
> > > > > want to pick a default behavior based on whatever the audit tells us 
> > > > > the most common situation is.
> > > > > 
> > > > > I think we're going to need something a bit more nuanced than "this 
> > > > > attribute matters for ODR" because there are attribute arguments that 
> > > > > don't always contribute to the entity (for example, we have "fake" 
> > > > > arguments that are used to give the semantic attribute more 
> > > > > information, there may also be cases where one argument matter and 
> > > > > another argument does not such as `enable_if` where the condition 
> > > > > matters greatly but the message doesn't matter much). So we might 
> > > > > need a marking on the attribute level *and* on the parameter level to 
> > > > > determine what all factors into attribute identity for generating the 
> > > > > ODR hashing code. Hopefully we can avoid needing more granularity 
> > > > > than that.
> > > > I feel the default behavior would be to do ODR hashes. I just took a 
> > > > quick look in Attr.td. And I feel most of them affecting the code 
> > > > generation. For example, there're a lot of attributes which is hardware 
> > > > related.
> > > > 
> > > > > because there are attribute arguments that don't always contribute to 
> > > > > the entity
> > > > 
> > > > When you're talking about "entities", I'm not sure if you're talking 
> > > > the "entities" in the language spec or a general abstract ones. I mean, 
> > > > for example, the `always_inline` attribute doesn't contribute to the 
> > > > declaration from the language perspective. But it'll be super odd if we 
> > > > lost it. So I feel it may be better to change the requirement to be 
> > > > "affecting code generation"
> > > > 
> > > > Also, to be honest, I am even not sure if we should take "affecting 
> > > > code generation " as the requirement. For example, for the 
> > > > `preferred_name` attribute, this is used for better printing names. It 
> > > > doesn't affect code generation nor the entity you mentioned. But if we 
> > > > drop it, then the printing name may not be what users want. I'm not 
> > > > sure if this is desired.
> > > > I feel the default behavior would be to do ODR hashes. I just took a 
> > > > quick look in Attr.td. And I feel most of them affecting the code 
> > > > generation. For example, there're a lot of attributes which is hardware 
> > > > related.
> > > 
> > > Hmmm, I'm seeing a reasonably even split. We definitely have a ton of 
> > > attributes like multiversioning, interrupts, calling conventions, 
> > > availability, etc that I think all should be part of an ODR hash. We also 
> > > have a ton of other ones that don't seem like they should matter for ODR 
> > > hashing like hot/cold, consumed/returns retained, constructor/destructor, 
> > > deprecated, nodiscard, maybe_unused, etc.
> > > 
> > > Then we have the really fun ones where I think we'll want them to be in 
> > > the ODR hash but maybe we don't, like `[[clang::annotate]]`, restrict, 
> > > etc.
> > > 
> > > So I think we really do need an actual audit of the attributes to decide 
> > > what the default should be (if there should even be one).
> > > 
> > > > When you're talking about "entities", I'm not sure if you're talking 
> > > > the "entities" in the language spec or a general abstract ones. I mean, 
> > > > for example, the always_inline attribute doesn't contribute to the 
> > > > declaration from the language perspective. But it'll be super odd if we 
> > > > lost it. So I feel it may be better to change the requirement to be 
> > > > "affecting code generation"
> > > 
> > > Not in a language spec meaning -- I meant mostly that there's some notion 
> > > of identify for the one definition rule and not all attribute arguments 
> > > contribute to that identity the same way.
> > > 
> > > I don't know that "affecting code gen" really captures the whole of it. 
> > > For example, whether a function is/is not marked hot or cold affects 
> > > codegen, but really has nothing to do with the identity of the function 
> > > (it's the same function whether the hint is there or not). 
> > > `preferred_name` is another example -- do we really think a structure 
> > > with that attribute is fundamentally a different thing than one without 
> > > that attribute? I don't think so.
> > > 
> > > To me, it's more about what cross-TU behaviors you'll get if the 
> > > attribute is only present on one side. Things like ABI tags, calling 
> > > conventions, attributes that behave like qualifiers, etc all contribute 
> > > to the identity of something where a mismatch will impact the correctness 
> > > of the program. But things like optimization hints and diagnostic hints 
> > > don't seem like they contribute to the identity of something because they 
> > > don't impact correctness.
> > I got your point. But from my developing experience, I want to say the 
> > optimization hints matters too. And I am wondering how about checking the 
> > ODR for all attributes. It would emit error if the attributes that 
> > affecting the correctness mismatches. And if the other attributes 
> > mismatches, a warning enough.  (If we worry about the compile time 
> > performance, I feel it won't take too long. And we can have an option to 
> > control it actually.)
> > I got your point. But from my developing experience, I want to say the 
> > optimization hints matters too. 
> 
> That's why I'd like to have an idea about what our policy is. To me, ODR 
> hashing should be restricted to finding ODR violations. It sounds like you'd 
> like ODR hashing to also optionally find other kinds of differences that 
> aren't necessarily ODR violations but still surprises nonetheless. I think 
> that's reasonable, but I think we should ensure the diagnostics distinguish 
> between "this is definitely UB" and "this might not have the optimization or 
> diagnostic properties you expect but is still correct".
> 
> > And I am wondering how about checking the ODR for all attributes. It would 
> > emit error if the attributes that affecting the correctness mismatches. And 
> > if the other attributes mismatches, a warning enough. (If we worry about 
> > the compile time performance, I feel it won't take too long. And we can 
> > have an option to control it actually.)
> 
> I don't think we should ODR-check all attributes blindly. There's no ODR 
> violation for one function having `[[nodiscard]]` in a TU and another 
> function not having it. In fact, attributes are sometimes used additively so 
> that you can put extra constraints locally that aren't there globally. e.g., 
> I've seen real code doing:
> ```
> // Header.h
> int some_func();
> 
> // Source.cpp
> #include "Header.h"
> 
> // We've had too many bugs from people ignoring the results,
> // which really matters in this particular file. Redeclare the API
> // with extra diagnostic checking for this TU.
> [[nodiscard]] int some_func();
> 
> ...
> ```
> It shouldn't cause ODR violation diagnostics if `some_func()` is defined in 
> another source file without the attribute.
Oh, yeah, you're right. Let's try to make the policy clearly when we started to 
it further.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135472/new/

https://reviews.llvm.org/D135472

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D135472: [ODRHash] Hash attributes on declarations.

Reply via email to