tahonermann added inline comments.
================
Comment at: clang/lib/Lex/Lexer.cpp:3386
+
+ if (!Result || CurPtr - StartPtr == (ptrdiff_t)(Buffer.size() + 4))
StartPtr = CurPtr;
----------------
cor3ntin wrote:
> tahonermann wrote:
> > cor3ntin wrote:
> > > cor3ntin wrote:
> > > > tahonermann wrote:
> > > > > This is a bit of a tangent, but under what circumstances would
> > > > > `CurPtr - StartPtr` not be equal to `Buffer.size() + 4`? Actually,
> > > > > I'm not sure that +4 is correct. It looks like `StartPtr` is expected
> > > > > to point to `N` at the beginning of the function, so the initial `\`
> > > > > is not included in the range. If `N` isn't seen, the function returns
> > > > > early. Likewise, if either of the `{` or `}` delimiters is not found,
> > > > > the function returns early.
> > > > >
> > > > > I think the goal of this code is to skip the `getAndAdvanceChar()`
> > > > > machinery when the buffer is being claimed (no need to parse UCNs or
> > > > > trigraphs in the character name), but it looks like, particularly
> > > > > with this DR, that we might always be able to do that.
> > > > afaict, 4 is correct here because we are one past-the-end.
> > > > I do however agree that this whole pattern which is used a few times is
> > > > probably unnecessary, i do think it would be good to investigate. Not
> > > > in this patch though, imo
> > > I looked into it, I'm sure we could improve but not easily,
> > > `getAndAdvanceChar` does set some flags on the token in the presence of
> > > trigraphs/line splicing and we need those flags to be set, this is the
> > > reason we do need to call that method.
> > > It's not super efficient but it's such an edge case... I'd rather not
> > > touch that now
> > My concern is that, as is, the code always takes the `else` branch (except
> > when `Result` is non-null). Assuming a buffer containing "X", the pointers
> > are arranged as follows (where `$` is one past the end).
> > \ N { X } $
> > | | `- CurPtr
> > | `- Buffer
> > `- StartPtr
> > `CurPtr - StartPtr` is 4, but `Buffer.size() + 4` is 5 (`Buffer.size()` is
> > 1 in this case).
> >
> > I think there might be an easy test to see if this is working as intended.
> > If it isn't, I would expect a diagnostic to be issued if trigraphs are
> > enabled and the buffer contains a trigraph sequence. Something like:
> > \N{LOTUS??>}
> I can try to add tests
>
> > My concern is that, as is, the code always takes the else branch (except
> > when Result is non-null).
> Yes, the if branch sole purpose is to set some state in Result.
>
> At the start of the function, StartPtr points to `\`
> And I'll leave a comment, maybe that will clear up future confusions
>
> There may be a potential refactor here, which is to have `getAndAdvanceChar`
> take a `bool & ShouldCleanupToken` parameter instead of a token so that we
> don't have to do that dance, but it's a bit outside of the scope of this
> patch...
> At the start of the function, StartPtr points to `\`
It doesn't look like it does. The first use of `StartPtr` is at line 3314 and
it expects to read `N`:
3314: char C = getCharAndSize(StartPtr, CharSize);
3315: assert(C == 'N' && "expected \\N{...}");
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D138861/new/
https://reviews.llvm.org/D138861
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits