aaron.ballman added a comment. Should we also change this to parse an unevaluated string literal: https://github.com/llvm/llvm-project/blob/main/clang/lib/Lex/ModuleMap.cpp#L1594 Similar question for all the uses in OpenMP stemming from: https://github.com/llvm/llvm-project/blob/main/clang/lib/Parse/ParseOpenMP.cpp#L825 (https://www.openmp.org/spec-html/5.1/openmp.html is the OpenMP spec but I didn't see any mention of encoding prefixes when I peeked)
I think we still need to handle UDLs so that we parse the declaration's `""` as an unevaluated string literal. ================ Comment at: clang/include/clang/AST/Expr.h:1845 StringRef getString() const { - assert(getCharByteWidth() == 1 && + assert((isUnevaluated() || getCharByteWidth() == 1) && "This function is used in places that assume strings use char"); ---------------- Do we also want to assert that if it is unevaluated, it's char byte width *is* one byte? (No such thing as a multibyte unevaluated string literal.) ================ Comment at: clang/include/clang/Basic/DiagnosticParseKinds.td:58 + def warn_misleading_indentation : Warning< ---------------- You can revert the unintended whitespace change to drop this whole file from the review. ================ Comment at: clang/lib/AST/Expr.cpp:1082 StringLiteralBits.NumConcatenated = NumConcatenated; + StringLiteralBits.CharByteWidth = 1; + ---------------- This should be in an `else` clause along with `StringLiteralBits.IsPascal = false;`. ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:1648 // have the same ud-suffix. - if (UDSuffixBuf != UDSuffix) { + const bool UnevaluatedStringHasUDL = Unevaluated && !UDSuffix.empty(); + if (UDSuffixBuf != UDSuffix || UnevaluatedStringHasUDL) { ---------------- ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:95-96 + case '?': + case 'n': + case 't': + return true; ---------------- aaron.ballman wrote: > cor3ntin wrote: > > aaron.ballman wrote: > > > Do you intend to miss a bunch of escapes like `\'` and `\r` (etc)? > > \' is there. I am less sure about '\r' and '\a'. for example. This is > > something I realized after writing P2361. > > what does '\a` in static assert mean? even '\r' is not so obvious > Looking at the list again, I think only `\a` is really of interest here. I > know some folks like @jfb have mentioned that `\a` could be used to generate > an alert sound on a terminal, which is a somewhat useful feature for a failed > static assertion if you squint at it hard enough. > > But the rest of the missing ones do seem more questionable to support. @jfb and @cor3ntin -- any opinions on whether `\a` should be supported? My opinion is that it should be supported because it has some utility for anyone running the compiler from a command line, but it's a pretty weak opinion. ================ Comment at: clang/lib/Lex/Pragma.cpp:807 if (Tok.is(tok::string_literal) && !Tok.hasUDSuffix()) { StringLiteralParser Literal(Tok, PP); if (Literal.hadError) ---------------- Should this also be modified? ================ Comment at: clang/lib/Parse/ParseDecl.cpp:374-376 + ParsedAttr::Kind AttrKind = + ParsedAttr::getParsedKind(AttrName, ScopeName, Syntax); + ---------------- I don't think this needed to move? ================ Comment at: clang/lib/Parse/ParseDecl.cpp:422-423 - ExprResult ArgExpr( - Actions.CorrectDelayedTyposInExpr(ParseAssignmentExpression())); + ExprResult ArgExpr(Actions.CorrectDelayedTyposInExpr( + ParseAttributeArgAsUnevaluatedLiteralOrExpression(AttrKind))); if (ArgExpr.isInvalid()) { ---------------- cor3ntin wrote: > aaron.ballman wrote: > > cor3ntin wrote: > > > aaron.ballman wrote: > > > > Hmmm, I'm not certain about these changes. > > > > > > > > For some attributes, the standard currently requires accepting any kind > > > > of string literal (like `[[deprecated]]` > > > > https://eel.is/c++draft/dcl.attr.deprecated#1). P2361 is proposing to > > > > change that, but it's not yet accepted by WG21 (let alone WG14). So > > > > giving errors in those cases is a bit of a hard sell -- I think > > > > warnings would be a bit more reasonable. > > > > > > > > But for other attributes (like `annotate`), it's a bit less clear > > > > whether we should *prevent* literal prefixes because the attribute can > > > > be used to have runtime impacts (for example, I can imagine someone > > > > using `annotate` to emit the string literal bytes into the resulting > > > > binary). In some cases, I think it's very reasonable (e.g., > > > > `diagnose_if` should behave the same as `deprecated` and `nodiscard` > > > > because those are purely about generating diagnostics at compile time). > > > > > > > > I kind of wonder whether we're going to want to tablegenerate whether > > > > the argument needs to be parsed as unevaluated or not on an > > > > attribute-by-attribute basis. > > > Yep, I would not expect this to get merge before P2361 but I think the > > > implementation experience is useful and raised a bunch of good questions. > > > I don't think it ever makes sense to have `L` outside of literals - but > > > people *might* do it currently, in which case there is a concern about > > > whether it breaks code (I have found no evidence of that though). > > > > > > If we wanted to inject these strings in the binary - in some form, then > > > we might have to transcode them at that point. > > > I don't think the user would know if the string would be injected as wide > > > or narrow (or something else) by the compiler. > > > > > > `L` is really is want to convert that string _at that point_. in an > > > attribute, strings might have multiple usages so it's better to delay any > > > transcoding. > > > Does that make sense? > > > > > > But I agree that a survey of what each attribute expects is in order. > > > > > > > > > > > > Yep, I would not expect this to get merge before P2361 but I think the > > > implementation experience is useful and raised a bunch of good questions. > > > > Absolutely agreed, this is worthwhile effort! > > > > > If we wanted to inject these strings in the binary - in some form, then > > > we might have to transcode them at that point. > > > I don't think the user would know if the string would be injected as wide > > > or narrow (or something else) by the compiler. > > > > My intuition is that a user who writes `L"foo"` will expect a wide `"foo"` > > to appear in the binary in the cases where the string ends up making it > > that far. > > > > > L is really is want to convert that string _at that point_. in an > > > attribute, strings might have multiple usages so it's better to delay any > > > transcoding. > > > Does that make sense? > > > > Not yet, but I might get there eventually. :-D My concern is that vendor > > attributes can basically do anything, so there's no reason to assume that > > any given string literal usage should or should not transcode. I think we > > have to decide on a case by case basis by letting the attributes specify > > what they intend in their argument lists. However, my intuition is that > > *most* attributes will expect unevaluated string literals because the > > string argument doesn't get passed to LLVM. > The status quo is that everything transcodes. > > But not transcoding, we do not destroy any information as to what is in the > source. > > If an attribute then wants to use the string later in such a way that it > needs to transcode to a literal encoding (or something else, for example, one > might imagine a fun scenario where literal are ASCII encoded and debug > information are EBCDIC encoded), then that can be done, because the string > still exists. > > Whereas for literal specifically, we assume they will be evaluated by the > abstract machine as per phase 5 so we transcode them immediately. which is > destructive. we get away with it because the original spelling is in the > source if we need it, and currently, literals are also assumed to be > (potentially invalid because of `\x` escape sequences) UTF-8. > > There is an alternative design where string literals are not transcoded until > lazily evaluated but I'm not sure there is a big motivation for that. > > So this PR is exactly trying not to force a specific behavior on attributes > that I assume can be displayed, put into some form in the binary, or > converted to literal which might represent 3 distinct encodings. The parser > leaving them as Unicode is the least opinionated thing the parser can > possibly do. > And then each attribute can decide for itself if it needs to transcode, and > how to handle any errors if they occur. > An attribute might decide to keep both a Unicode and non-Unicode spelling > around if the string has a dual purpose, etc > > > Question though, Is there a scenario in which `\x`/`\0` would actually be > useful in the context of attributes? Because if so, then we might need to do > something to allow that. > Question though, Is there a scenario in which \x/\0 would actually be useful > in the context of attributes? Because if so, then we might need to do > something to allow that. Emitting binary data is the biggest use case I can think of, but I don't think we have any Clang attributes that do this currently. It's possible there are plugin-based attributes that need that functionality, but it also seems unlikely. ================ Comment at: clang/lib/Sema/SemaDeclAttr.cpp:363 - if (!Literal || !Literal->isAscii()) { + // TODO all StringLiteral here should be unevaluated + ---------------- I'm not certain what's left to be TOdone here? ================ Comment at: clang/lib/Sema/SemaDeclCXX.cpp:16077 - if (!Lit->isAscii()) { - Diag(LangStr->getExprLoc(), diag::err_language_linkage_spec_not_ascii) - << LangStr->getSourceRange(); ---------------- This diagnostic can be removed from DiagnosticSemaKinds.td now. ================ Comment at: clang/lib/Sema/SemaExpr.cpp:1825 +ExprResult Sema::ActOnUnevaluatedStringLiteral(ArrayRef<Token> StringToks) { + StringLiteralParser Literal(StringToks, PP, true); + if (Literal.hadError) ---------------- Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D105759/new/ https://reviews.llvm.org/D105759 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits