cor3ntin added inline comments.
================ Comment at: clang/lib/Lex/Lexer.cpp:3378-3379 - if (LooseMatch) + // If no diagnostic has been emitted yet we do not want to recover here + // to make sure this function will be called again and a diagnostic emitted then. + if (LooseMatch && Diagnose) ---------------- tahonermann wrote: > cor3ntin wrote: > > tahonermann wrote: > > > I'm having a hard time understanding what this comment is trying to > > > convey. The comment response to Aaron was much more helpful to me. How > > > about this suggested edit? > > It's not that a diagnostic would be emitted twice, is that it would not be > > emitted at all. > > By recovering - ie returning a loose match - we prevent the function to be > > called again on the same buffer. > > So if during the first call Diagnose is false (because we were called from > > a tryXXXX function), then the compilation will continue assuming we were in > > fact able to parse an identifier and never informs us of an invalid name. > More evidence of the difficulty I'm having appreciating this comment :) > > Does this sound right? > // Only map a loose match to a code point when in a diagnosing state. > // If a mapped code point were to be returned in a non-diagnosing state, > // token caching would prevent the opportunity to issue a diagnostic when > // the token is later used. > > Tangent: It might be worth renaming one of `Res` and `Result`. I keep getting > confused due to the similar names. Perhaps rename `Result` to `Token` or > `ResultToken`? I'm not sure caching is exactly right. There are tentative parses, that should not give false positive answers. I'll try to message a better comment ================ Comment at: clang/lib/Lex/Lexer.cpp:3386 + + if (!Result || CurPtr - StartPtr == (ptrdiff_t)(Buffer.size() + 4)) StartPtr = CurPtr; ---------------- tahonermann wrote: > cor3ntin wrote: > > cor3ntin wrote: > > > tahonermann wrote: > > > > This is a bit of a tangent, but under what circumstances would `CurPtr > > > > - StartPtr` not be equal to `Buffer.size() + 4`? Actually, I'm not sure > > > > that +4 is correct. It looks like `StartPtr` is expected to point to > > > > `N` at the beginning of the function, so the initial `\` is not > > > > included in the range. If `N` isn't seen, the function returns early. > > > > Likewise, if either of the `{` or `}` delimiters is not found, the > > > > function returns early. > > > > > > > > I think the goal of this code is to skip the `getAndAdvanceChar()` > > > > machinery when the buffer is being claimed (no need to parse UCNs or > > > > trigraphs in the character name), but it looks like, particularly with > > > > this DR, that we might always be able to do that. > > > afaict, 4 is correct here because we are one past-the-end. > > > I do however agree that this whole pattern which is used a few times is > > > probably unnecessary, i do think it would be good to investigate. Not in > > > this patch though, imo > > I looked into it, I'm sure we could improve but not easily, > > `getAndAdvanceChar` does set some flags on the token in the presence of > > trigraphs/line splicing and we need those flags to be set, this is the > > reason we do need to call that method. > > It's not super efficient but it's such an edge case... I'd rather not touch > > that now > My concern is that, as is, the code always takes the `else` branch (except > when `Result` is non-null). Assuming a buffer containing "X", the pointers > are arranged as follows (where `$` is one past the end). > \ N { X } $ > | | `- CurPtr > | `- Buffer > `- StartPtr > `CurPtr - StartPtr` is 4, but `Buffer.size() + 4` is 5 (`Buffer.size()` is 1 > in this case). > > I think there might be an easy test to see if this is working as intended. If > it isn't, I would expect a diagnostic to be issued if trigraphs are enabled > and the buffer contains a trigraph sequence. Something like: > \N{LOTUS??>} I can try to add tests > My concern is that, as is, the code always takes the else branch (except when > Result is non-null). Yes, the if branch sole purpose is to set some state in Result. At the start of the function, StartPtr points to `\` And I'll leave a comment, maybe that will clear up future confusions There may be a potential refactor here, which is to have `getAndAdvanceChar` take a `bool & ShouldCleanupToken` parameter instead of a token so that we don't have to do that dance, but it's a bit outside of the scope of this patch... ================ Comment at: clang/test/Preprocessor/ucn-pp-identifier.c:141 +// expected-warning@-2 {{incomplete delimited universal character name}} #ifdef TRIGRAPHS ---------------- tahonermann wrote: > How about adding a test for an escaped new line? I think this is currently UB > according to the standard ([[ http://eel.is/c++draft/lex.phases#1.2 | > (lex.phases)p2 ]]), but I believe you are trying to change that with [[ > https://wg21.link/p2621 | P2621 ]]. > int \N{\ > LATIN CAPITAL LETTER A WITH GRAVE}; I can add a test, sure P2621 is not super relevant (or rather, it standardize behaviors that clang already has) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D138861/new/ https://reviews.llvm.org/D138861 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits