[PATCH] D135472: [ODRHash] Hash attributes on declarations.

Chuanqi Xu via Phabricator via cfe-commits Thu, 13 Oct 2022 19:26:38 -0700

ChuanqiXu accepted this revision.
ChuanqiXu added a comment.
This revision is now accepted and ready to land.


LGTM basically. Our discussion is mainly about the future work in Attr.td and 
not this patch. So I think they are not blocking issues.



================
Comment at: clang/lib/AST/ODRDiagsEmitter.cpp:337-354
+    if (!LHS || !RHS || LHS->getKind() != RHS->getKind()) {
+      DiagError(AttributeKind)
+          << (LHS != nullptr) << LHS << FirstDecl->getSourceRange();
+      DiagNoteAttrLoc(LHS);
+      DiagNote(AttributeKind)
+          << (RHS != nullptr) << RHS << SecondDecl->getSourceRange();
+      DiagNoteAttrLoc(RHS);
----------------
vsapsai wrote:
> ChuanqiXu wrote:
> > vsapsai wrote:
> > > ChuanqiXu wrote:
> > > > I feel like we can merge these two statements.
> > > Sorry, I don't really get what two statements you are talking about. Is 
> > > it 
> > > * `!LHS || !RHS || LHS->getKind() != RHS->getKind()`
> > > * `ComputeAttrODRHash(LHS) != ComputeAttrODRHash(RHS)`
> > > ?
> > Sorry for the ambiguity. Since `LHS->getKind() != RHS->getKind()` is 
> > covered by `ComputeAttrODRHash(LHS) != ComputeAttrODRHash(RHS)`. I feel 
> > like it is reasonable to:
> > 
> > ```
> > if (!LHS || !RHS || ComputeAttrODRHash(LHS) != ComputeAttrODRHash(RHS)) {
> >     DiagError();
> >     DiagNote();
> >     DiagNote();
> >     DiagNoteAttrLoc();
> >     return true;
> > }
> > ```
> There are 2 separate cases to improve the diagnostic. In the first case we'd 
> have
> ```
> ... first difference is definition in module 'FirstModule' found attribute 
> 'enum_extensibility'
> attribute specified here
> but in 'SecondModule' found no attribute
> ```
> 
> And if we reach the second ones, it implies the kind of attributes is the 
> same and the only difference is attribute arguments, so the diagnostics are 
> like
> ```
> first difference is definition in module 'FirstModule' found attribute ' 
> __attribute__((enum_extensibility("open")))'
> attribute specified here
> but in 'SecondModule' found different attribute argument ' 
> __attribute__((enum_extensibility("closed")))'
> attribute specified here
> ```
> 
> From my limited experience, it is actually useful to have more details than 
> trying to figure out the difference in attributes.
Agreed. I just wanted to say the codes can be further be shorten by making the 
diagnostic kinds contains more cases:

```
%select {
|
|
...
}
```

We have some such examples before. But this is not required. Since it actually 
moves the complexity from the source codes to the table gen. And the 
implementation doesn't look bad.


================
Comment at: clang/lib/AST/ODRHash.cpp:479-480
+
+  llvm::copy_if(D->attrs(), std::back_inserter(HashableAttrs),
+                [](const Attr *A) { return !A->isImplicit(); });
+}
----------------
aaron.ballman wrote:
> ChuanqiXu wrote:
> > aaron.ballman wrote:
> > > ChuanqiXu wrote:
> > > > vsapsai wrote:
> > > > > vsapsai wrote:
> > > > > > erichkeane wrote:
> > > > > > > aaron.ballman wrote:
> > > > > > > > ChuanqiXu wrote:
> > > > > > > > > vsapsai wrote:
> > > > > > > > > > I'm not sure `isImplicit` is the best indicator of 
> > > > > > > > > > attributes to check, so suggestions in this area are 
> > > > > > > > > > welcome. I think we can start strict and relax some of the 
> > > > > > > > > > checks if needed.
> > > > > > > > > > 
> > > > > > > > > > If people have strong opinions that some attributes 
> > > > > > > > > > shouldn't be ignored, we can add them to the tests to avoid 
> > > > > > > > > > regressions. Personally, I believe that alignment and 
> > > > > > > > > > packed attributes should never be silently ignored.
> > > > > > > > > Agreed. I feel `isImplicit` is enough for now.
> > > > > > > > The tricky part is -- sometimes certain attributes add 
> > > > > > > > additional implicit attributes and those implicit attributes 
> > > > > > > > matter 
> > > > > > > > (https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaDeclAttr.cpp#L9380).
> > > > > > > >  And some attributes seem to just do the wrong thing entirely: 
> > > > > > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaDeclAttr.cpp#L7344
> > > > > > > > 
> > > > > > > > So I think `isImplicit()` is a good approximation, but I'm more 
> > > > > > > > wondering what the principle is for whether an attribute should 
> > > > > > > > or should not be considered part of the ODR hash. Type 
> > > > > > > > attributes, attributes that impact object layout, etc all seem 
> > > > > > > > like they should almost certainly be part of ODR hashing. 
> > > > > > > > Others are a bit more questionable though.
> > > > > > > > 
> > > > > > > > I think this is something that may need a per-attribute flag in 
> > > > > > > > Attr.td so attributes can opt in or out of it because I'm not 
> > > > > > > > certain what ODR issues could stem from `[[maybe_unused]]` or 
> > > > > > > > `[[deprecated]]` disagreements across module boundaries.
> > > > > > > I don't think 'isImplicit' is particularly good.  I think the 
> > > > > > > idea of auto-adding 'type' attributes and having the 'rest' be 
> > > > > > > analyzed to figure out which are important.
> > > > > > > 
> > > > > > > Alternatively, I wonder if we're better off just adding ALL 
> > > > > > > attributes and seeing what the fallout is. We can later decide 
> > > > > > > when we don't want them to be ODR significant (which, might be 
> > > > > > > OTHERWISE meaningful later!).
> > > > > > One criteria to decide which attributes should be hashed is if they 
> > > > > > affect IRGen. But that's not exhaustive and I'm not sure how 
> > > > > > practical it is.
> > > > > > 
> > > > > > The rule I'm trying to follow right now is if declarations with 
> > > > > > different attributes can be substituted instead of each other. For 
> > > > > > example, structs with different memory layout cannot be 
> > > > > > interchanged, so it is reasonable to reject them.
> > > > > > 
> > > > > > But maybe we should review attributes on case-by-case basis. For 
> > > > > > example, for `[[deprecated]]` I think the best for developers is 
> > > > > > not to complain about it but merge the attributes and then have 
> > > > > > regular diagnostic about using a deprecated entity.
> > > > > One option was to hash `isa<InheritedAttr>` attributes as they are 
> > > > > "sticky" and should be more significant, so shouldn't be ignored. But 
> > > > > I don't think this argument is particularly convincing.
> > > > > 
> > > > > At this stage I think we can still afford to add all attributes and 
> > > > > exclude some as-needed because modules adoption is still limited. 
> > > > > Though I don't have strong feelings about this approach, just getting 
> > > > > more restrictive later can be hard.
> > > > It looks not a bad idea to add all attributes as experiments, @vsapsai 
> > > > how do you feel about this?
> > > My intuition is that we're going to want this to be controlled from 
> > > Attr.td on a case-by-case basis and to automatically generate the ODR 
> > > hash code for attribute arguments.
> > > 
> > > We can either force every attribute to decide explicitly (this seems 
> > > pretty disruptive as a final design but would be a really good way to 
> > > ensure we audit all attributes if done as an experiment), or we can pick 
> > > a default behavior to ODR hash/not. I suspect we're going to want to pick 
> > > a default behavior based on whatever the audit tells us the most common 
> > > situation is.
> > > 
> > > I think we're going to need something a bit more nuanced than "this 
> > > attribute matters for ODR" because there are attribute arguments that 
> > > don't always contribute to the entity (for example, we have "fake" 
> > > arguments that are used to give the semantic attribute more information, 
> > > there may also be cases where one argument matter and another argument 
> > > does not such as `enable_if` where the condition matters greatly but the 
> > > message doesn't matter much). So we might need a marking on the attribute 
> > > level *and* on the parameter level to determine what all factors into 
> > > attribute identity for generating the ODR hashing code. Hopefully we can 
> > > avoid needing more granularity than that.
> > I feel the default behavior would be to do ODR hashes. I just took a quick 
> > look in Attr.td. And I feel most of them affecting the code generation. For 
> > example, there're a lot of attributes which is hardware related.
> > 
> > > because there are attribute arguments that don't always contribute to the 
> > > entity
> > 
> > When you're talking about "entities", I'm not sure if you're talking the 
> > "entities" in the language spec or a general abstract ones. I mean, for 
> > example, the `always_inline` attribute doesn't contribute to the 
> > declaration from the language perspective. But it'll be super odd if we 
> > lost it. So I feel it may be better to change the requirement to be 
> > "affecting code generation"
> > 
> > Also, to be honest, I am even not sure if we should take "affecting code 
> > generation " as the requirement. For example, for the `preferred_name` 
> > attribute, this is used for better printing names. It doesn't affect code 
> > generation nor the entity you mentioned. But if we drop it, then the 
> > printing name may not be what users want. I'm not sure if this is desired.
> > I feel the default behavior would be to do ODR hashes. I just took a quick 
> > look in Attr.td. And I feel most of them affecting the code generation. For 
> > example, there're a lot of attributes which is hardware related.
> 
> Hmmm, I'm seeing a reasonably even split. We definitely have a ton of 
> attributes like multiversioning, interrupts, calling conventions, 
> availability, etc that I think all should be part of an ODR hash. We also 
> have a ton of other ones that don't seem like they should matter for ODR 
> hashing like hot/cold, consumed/returns retained, constructor/destructor, 
> deprecated, nodiscard, maybe_unused, etc.
> 
> Then we have the really fun ones where I think we'll want them to be in the 
> ODR hash but maybe we don't, like `[[clang::annotate]]`, restrict, etc.
> 
> So I think we really do need an actual audit of the attributes to decide what 
> the default should be (if there should even be one).
> 
> > When you're talking about "entities", I'm not sure if you're talking the 
> > "entities" in the language spec or a general abstract ones. I mean, for 
> > example, the always_inline attribute doesn't contribute to the declaration 
> > from the language perspective. But it'll be super odd if we lost it. So I 
> > feel it may be better to change the requirement to be "affecting code 
> > generation"
> 
> Not in a language spec meaning -- I meant mostly that there's some notion of 
> identify for the one definition rule and not all attribute arguments 
> contribute to that identity the same way.
> 
> I don't know that "affecting code gen" really captures the whole of it. For 
> example, whether a function is/is not marked hot or cold affects codegen, but 
> really has nothing to do with the identity of the function (it's the same 
> function whether the hint is there or not). `preferred_name` is another 
> example -- do we really think a structure with that attribute is 
> fundamentally a different thing than one without that attribute? I don't 
> think so.
> 
> To me, it's more about what cross-TU behaviors you'll get if the attribute is 
> only present on one side. Things like ABI tags, calling conventions, 
> attributes that behave like qualifiers, etc all contribute to the identity of 
> something where a mismatch will impact the correctness of the program. But 
> things like optimization hints and diagnostic hints don't seem like they 
> contribute to the identity of something because they don't impact correctness.
I got your point. But from my developing experience, I want to say the 
optimization hints matters too. And I am wondering how about checking the ODR 
for all attributes. It would emit error if the attributes that affecting the 
correctness mismatches. And if the other attributes mismatches, a warning 
enough.  (If we worry about the compile time performance, I feel it won't take 
too long. And we can have an option to control it actually.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135472/new/

https://reviews.llvm.org/D135472

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D135472: [ODRHash] Hash attributes on declarations.

Reply via email to