[PATCH] D105759: [WIP] Implement P2361 Unevaluated string literals

Aaron Ballman via Phabricator via cfe-commits Mon, 27 Sep 2021 06:31:01 -0700

aaron.ballman added a comment.

Should we also change this to parse an unevaluated string literal: 
https://github.com/llvm/llvm-project/blob/main/clang/lib/Lex/ModuleMap.cpp#L1594
Similar question for all the uses in OpenMP stemming from: 
https://github.com/llvm/llvm-project/blob/main/clang/lib/Parse/ParseOpenMP.cpp#L825
 (https://www.openmp.org/spec-html/5.1/openmp.html is the OpenMP spec but I 
didn't see any mention of encoding prefixes when I peeked)


I think we still need to handle UDLs so that we parse the declaration's `""` as 
an unevaluated string literal.



================
Comment at: clang/include/clang/AST/Expr.h:1845
   StringRef getString() const {
-    assert(getCharByteWidth() == 1 &&
+    assert((isUnevaluated() || getCharByteWidth() == 1) &&
            "This function is used in places that assume strings use char");
----------------
Do we also want to assert that if it is unevaluated, it's char byte width *is* 
one byte? (No such thing as a multibyte unevaluated string literal.)


================
Comment at: clang/include/clang/Basic/DiagnosticParseKinds.td:58
 
+
 def warn_misleading_indentation : Warning<
----------------
You can revert the unintended whitespace change to drop this whole file from 
the review.


================
Comment at: clang/lib/AST/Expr.cpp:1082
   StringLiteralBits.NumConcatenated = NumConcatenated;
+  StringLiteralBits.CharByteWidth = 1;
+
----------------
This should be in an `else` clause along with `StringLiteralBits.IsPascal = 
false;`.


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1648
         // have the same ud-suffix.
-        if (UDSuffixBuf != UDSuffix) {
+        const bool UnevaluatedStringHasUDL = Unevaluated && !UDSuffix.empty();
+        if (UDSuffixBuf != UDSuffix || UnevaluatedStringHasUDL) {
----------------



================
Comment at: clang/lib/Lex/LiteralSupport.cpp:95-96
+  case '?':
+  case 'n':
+  case 't':
+    return true;
----------------
aaron.ballman wrote:
> cor3ntin wrote:
> > aaron.ballman wrote:
> > > Do you intend to miss a bunch of escapes like `\'` and `\r` (etc)?
> > \' is there. I am less sure about '\r' and '\a'. for example. This is 
> > something I realized after writing P2361.
> > what does '\a` in static assert mean? even '\r' is not so obvious
> Looking at the list again, I think only `\a` is really of interest here. I 
> know some folks like @jfb have mentioned that `\a` could be used to generate 
> an alert sound on a terminal, which is a somewhat useful feature for a failed 
> static assertion if you squint at it hard enough.
> 
> But the rest of the missing ones do seem more questionable to support.
@jfb and @cor3ntin -- any opinions on whether `\a` should be supported? My 
opinion is that it should be supported because it has some utility for anyone 
running the compiler from a command line, but it's a pretty weak opinion.


================
Comment at: clang/lib/Lex/Pragma.cpp:807
   if (Tok.is(tok::string_literal) && !Tok.hasUDSuffix()) {
     StringLiteralParser Literal(Tok, PP);
     if (Literal.hadError)
----------------
Should this also be modified?


================
Comment at: clang/lib/Parse/ParseDecl.cpp:374-376
+  ParsedAttr::Kind AttrKind =
+      ParsedAttr::getParsedKind(AttrName, ScopeName, Syntax);
+
----------------
I don't think this needed to move?


================
Comment at: clang/lib/Parse/ParseDecl.cpp:422-423
 
-        ExprResult ArgExpr(
-            Actions.CorrectDelayedTyposInExpr(ParseAssignmentExpression()));
+        ExprResult ArgExpr(Actions.CorrectDelayedTyposInExpr(
+            ParseAttributeArgAsUnevaluatedLiteralOrExpression(AttrKind)));
         if (ArgExpr.isInvalid()) {
----------------
cor3ntin wrote:
> aaron.ballman wrote:
> > cor3ntin wrote:
> > > aaron.ballman wrote:
> > > > Hmmm, I'm not certain about these changes.
> > > > 
> > > > For some attributes, the standard currently requires accepting any kind 
> > > > of string literal (like `[[deprecated]]` 
> > > > https://eel.is/c++draft/dcl.attr.deprecated#1). P2361 is proposing to 
> > > > change that, but it's not yet accepted by WG21 (let alone WG14). So 
> > > > giving errors in those cases is a bit of a hard sell -- I think 
> > > > warnings would be a bit more reasonable.
> > > > 
> > > > But for other attributes (like `annotate`), it's a bit less clear 
> > > > whether we should *prevent* literal prefixes because the attribute can 
> > > > be used to have runtime impacts (for example, I can imagine someone 
> > > > using `annotate` to emit the string literal bytes into the resulting 
> > > > binary). In some cases, I think it's very reasonable (e.g., 
> > > > `diagnose_if` should behave the same as `deprecated` and `nodiscard` 
> > > > because those are purely about generating diagnostics at compile time).
> > > > 
> > > > I kind of wonder whether we're going to want to tablegenerate whether 
> > > > the argument needs to be parsed as unevaluated or not on an 
> > > > attribute-by-attribute basis.
> > > Yep, I would not expect this to get merge before P2361 but I think the 
> > > implementation experience is useful and raised a bunch of good questions.
> > > I don't think it ever makes sense to have `L` outside of literals - but 
> > > people *might* do it currently, in which case there is a concern about 
> > > whether it breaks code (I have found no evidence of that though).
> > > 
> > > If we wanted to inject these strings in the binary - in some form, then 
> > > we might have to transcode them at that point.
> > > I don't think the user would know if the string would be injected as wide 
> > > or narrow (or something else) by the compiler.
> > > 
> > > `L` is really is want to convert that string _at that point_. in an 
> > > attribute, strings might have multiple usages so it's better to delay any 
> > > transcoding.
> > > Does that make sense?
> > > 
> > > But I agree that a survey of what each attribute expects is in order.
> > > 
> > > 
> > > 
> > > Yep, I would not expect this to get merge before P2361 but I think the 
> > > implementation experience is useful and raised a bunch of good questions.
> > 
> > Absolutely agreed, this is worthwhile effort!
> > 
> > > If we wanted to inject these strings in the binary - in some form, then 
> > > we might have to transcode them at that point.
> > > I don't think the user would know if the string would be injected as wide 
> > > or narrow (or something else) by the compiler.
> > 
> > My intuition is that a user who writes `L"foo"` will expect a wide `"foo"` 
> > to appear in the binary in the cases where the string ends up making it 
> > that far.
> > 
> > > L is really is want to convert that string _at that point_. in an 
> > > attribute, strings might have multiple usages so it's better to delay any 
> > > transcoding.
> > > Does that make sense?
> > 
> > Not yet, but I might get there eventually. :-D My concern is that vendor 
> > attributes can basically do anything, so there's no reason to assume that 
> > any given string literal usage should or should not transcode. I think we 
> > have to decide on a case by case basis by letting the attributes specify 
> > what they intend in their argument lists. However, my intuition is that 
> > *most* attributes will expect unevaluated string literals because the 
> > string argument doesn't get passed to LLVM.
> The status quo is that everything transcodes.
> 
> But not transcoding, we do not destroy any information as to what is in the 
> source.
> 
> If an attribute then wants to use the string later in such a way that it 
> needs to transcode to a literal encoding (or something else, for example, one 
> might imagine a fun scenario where literal are ASCII encoded and debug 
> information are EBCDIC encoded), then that can be done, because the string 
> still exists.
> 
> Whereas for literal specifically, we assume they will be evaluated by the 
> abstract machine as per phase 5 so we transcode them immediately. which is 
> destructive. we get away with it because the original spelling is in the 
> source if we need it, and currently, literals are also assumed to be 
> (potentially invalid because of `\x` escape sequences) UTF-8.
> 
> There is an alternative design where string literals are not transcoded until 
> lazily evaluated but I'm not sure there is a big motivation for that.
> 
> So this PR is exactly trying not to force a specific behavior on attributes 
> that I assume can be displayed, put into some form in the binary, or 
> converted to literal which might represent 3 distinct encodings. The parser 
> leaving them as Unicode is the least opinionated thing the parser can 
> possibly do.
> And then each attribute can decide for itself if it needs to transcode, and 
> how to handle any errors if they occur.
> An attribute might decide to keep both a Unicode and non-Unicode spelling 
> around if the string has a dual purpose, etc
> 
> 
> Question though, Is there a scenario in which `\x`/`\0` would actually be 
> useful in the context of attributes? Because if so, then we might need to do 
> something to allow that.
> Question though, Is there a scenario in which \x/\0 would actually be useful 
> in the context of attributes? Because if so, then we might need to do 
> something to allow that.

Emitting binary data is the biggest use case I can think of, but I don't think 
we have any Clang attributes that do this currently. It's possible there are 
plugin-based attributes that need that functionality, but it also seems 
unlikely.


================
Comment at: clang/lib/Sema/SemaDeclAttr.cpp:363
 
-  if (!Literal || !Literal->isAscii()) {
+  // TODO all StringLiteral here should be unevaluated
+
----------------
I'm not certain what's left to be TOdone here?


================
Comment at: clang/lib/Sema/SemaDeclCXX.cpp:16077
-  if (!Lit->isAscii()) {
-    Diag(LangStr->getExprLoc(), diag::err_language_linkage_spec_not_ascii)
-      << LangStr->getSourceRange();
----------------
This diagnostic can be removed from DiagnosticSemaKinds.td now.


================
Comment at: clang/lib/Sema/SemaExpr.cpp:1825
+ExprResult Sema::ActOnUnevaluatedStringLiteral(ArrayRef<Token> StringToks) {
+  StringLiteralParser Literal(StringToks, PP, true);
+  if (Literal.hadError)
----------------



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105759/new/

https://reviews.llvm.org/D105759

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105759: [WIP] Implement P2361 Unevaluated string literals

Reply via email to