[PATCH] D105759: [WIP] Implement P2361 Unevaluated string literals

Corentin Jabot via Phabricator via cfe-commits Mon, 27 Sep 2021 07:31:10 -0700

cor3ntin added inline comments.


================
Comment at: clang/lib/Parse/ParseDecl.cpp:422-423
 
-        ExprResult ArgExpr(
-            Actions.CorrectDelayedTyposInExpr(ParseAssignmentExpression()));
+        ExprResult ArgExpr(Actions.CorrectDelayedTyposInExpr(
+            ParseAttributeArgAsUnevaluatedLiteralOrExpression(AttrKind)));
         if (ArgExpr.isInvalid()) {
----------------
aaron.ballman wrote:
> cor3ntin wrote:
> > aaron.ballman wrote:
> > > cor3ntin wrote:
> > > > aaron.ballman wrote:
> > > > > Hmmm, I'm not certain about these changes.
> > > > > 
> > > > > For some attributes, the standard currently requires accepting any 
> > > > > kind of string literal (like `[[deprecated]]` 
> > > > > https://eel.is/c++draft/dcl.attr.deprecated#1). P2361 is proposing to 
> > > > > change that, but it's not yet accepted by WG21 (let alone WG14). So 
> > > > > giving errors in those cases is a bit of a hard sell -- I think 
> > > > > warnings would be a bit more reasonable.
> > > > > 
> > > > > But for other attributes (like `annotate`), it's a bit less clear 
> > > > > whether we should *prevent* literal prefixes because the attribute 
> > > > > can be used to have runtime impacts (for example, I can imagine 
> > > > > someone using `annotate` to emit the string literal bytes into the 
> > > > > resulting binary). In some cases, I think it's very reasonable (e.g., 
> > > > > `diagnose_if` should behave the same as `deprecated` and `nodiscard` 
> > > > > because those are purely about generating diagnostics at compile 
> > > > > time).
> > > > > 
> > > > > I kind of wonder whether we're going to want to tablegenerate whether 
> > > > > the argument needs to be parsed as unevaluated or not on an 
> > > > > attribute-by-attribute basis.
> > > > Yep, I would not expect this to get merge before P2361 but I think the 
> > > > implementation experience is useful and raised a bunch of good 
> > > > questions.
> > > > I don't think it ever makes sense to have `L` outside of literals - but 
> > > > people *might* do it currently, in which case there is a concern about 
> > > > whether it breaks code (I have found no evidence of that though).
> > > > 
> > > > If we wanted to inject these strings in the binary - in some form, then 
> > > > we might have to transcode them at that point.
> > > > I don't think the user would know if the string would be injected as 
> > > > wide or narrow (or something else) by the compiler.
> > > > 
> > > > `L` is really is want to convert that string _at that point_. in an 
> > > > attribute, strings might have multiple usages so it's better to delay 
> > > > any transcoding.
> > > > Does that make sense?
> > > > 
> > > > But I agree that a survey of what each attribute expects is in order.
> > > > 
> > > > 
> > > > 
> > > > Yep, I would not expect this to get merge before P2361 but I think the 
> > > > implementation experience is useful and raised a bunch of good 
> > > > questions.
> > > 
> > > Absolutely agreed, this is worthwhile effort!
> > > 
> > > > If we wanted to inject these strings in the binary - in some form, then 
> > > > we might have to transcode them at that point.
> > > > I don't think the user would know if the string would be injected as 
> > > > wide or narrow (or something else) by the compiler.
> > > 
> > > My intuition is that a user who writes `L"foo"` will expect a wide 
> > > `"foo"` to appear in the binary in the cases where the string ends up 
> > > making it that far.
> > > 
> > > > L is really is want to convert that string _at that point_. in an 
> > > > attribute, strings might have multiple usages so it's better to delay 
> > > > any transcoding.
> > > > Does that make sense?
> > > 
> > > Not yet, but I might get there eventually. :-D My concern is that vendor 
> > > attributes can basically do anything, so there's no reason to assume that 
> > > any given string literal usage should or should not transcode. I think we 
> > > have to decide on a case by case basis by letting the attributes specify 
> > > what they intend in their argument lists. However, my intuition is that 
> > > *most* attributes will expect unevaluated string literals because the 
> > > string argument doesn't get passed to LLVM.
> > The status quo is that everything transcodes.
> > 
> > But not transcoding, we do not destroy any information as to what is in the 
> > source.
> > 
> > If an attribute then wants to use the string later in such a way that it 
> > needs to transcode to a literal encoding (or something else, for example, 
> > one might imagine a fun scenario where literal are ASCII encoded and debug 
> > information are EBCDIC encoded), then that can be done, because the string 
> > still exists.
> > 
> > Whereas for literal specifically, we assume they will be evaluated by the 
> > abstract machine as per phase 5 so we transcode them immediately. which is 
> > destructive. we get away with it because the original spelling is in the 
> > source if we need it, and currently, literals are also assumed to be 
> > (potentially invalid because of `\x` escape sequences) UTF-8.
> > 
> > There is an alternative design where string literals are not transcoded 
> > until lazily evaluated but I'm not sure there is a big motivation for that.
> > 
> > So this PR is exactly trying not to force a specific behavior on attributes 
> > that I assume can be displayed, put into some form in the binary, or 
> > converted to literal which might represent 3 distinct encodings. The parser 
> > leaving them as Unicode is the least opinionated thing the parser can 
> > possibly do.
> > And then each attribute can decide for itself if it needs to transcode, and 
> > how to handle any errors if they occur.
> > An attribute might decide to keep both a Unicode and non-Unicode spelling 
> > around if the string has a dual purpose, etc
> > 
> > 
> > Question though, Is there a scenario in which `\x`/`\0` would actually be 
> > useful in the context of attributes? Because if so, then we might need to 
> > do something to allow that.
> > Question though, Is there a scenario in which \x/\0 would actually be 
> > useful in the context of attributes? Because if so, then we might need to 
> > do something to allow that.
> 
> Emitting binary data is the biggest use case I can think of, but I don't 
> think we have any Clang attributes that do this currently. It's possible 
> there are plugin-based attributes that need that functionality, but it also 
> seems unlikely.
Even if we wanted to support that in the future, there is no rule that says 
that attributes can't have evaluated strings. It's on a case by case basis


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105759/new/

https://reviews.llvm.org/D105759

_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105759: [WIP] Implement P2361 Unevaluated string literals

Reply via email to