dgoldman added inline comments.
================ Comment at: clang/lib/Lex/PPLexerChange.cpp:436 + if (PragmaAssumeNonNullLoc.isValid() && !this->PPOpts->GeneratePreamble && + !(CurLexer && CurLexer->getFileID() == PredefinesFileID) && !isEndOfMacro && !(CurLexer && CurLexer->Is_PragmaLexer)) { ---------------- sammccall wrote: > dgoldman wrote: > > sammccall wrote: > > > sammccall wrote: > > > > dgoldman wrote: > > > > > sammccall wrote: > > > > > > dgoldman wrote: > > > > > > > sammccall wrote: > > > > > > > > what's the PredefinesFileID special case about? > > > > > > > See ExitedFromPredefinesFile below, this is how I check if we're > > > > > > > falling off the end of the preamble region into the main file > > > > > > > (same as done for the conditional stack), LMK if there's a better > > > > > > > way. > > > > > > I don't understand how this checks if we're falling off the > > > > > > preamble region, and the code around ExitedFromPredefinesFile > > > > > > doesn't clarify this for me. Can you explain? > > > > > > > > > > > > The `if (ExitedFromPredefinesFile)` appears to be handling the > > > > > > logical *insertion* of preamble PP events when *consuming* a > > > > > > preamble, which is not what you're trying to do here. > > > > > > > > > > > > The condition here is of the form "if we have an open pragma, and > > > > > > we're not generating a preamble, and some other stuff, then > > > > > > diagnose". So if the baseline case is "diagnose" and the preamble > > > > > > case is an exception, the "other stuff" clauses don't limit the > > > > > > preamble exception, they add extra exceptions! > > > > > `!(CurLexer && CurLexer->getFileID() == PredefinesFileID)` makes sure > > > > > that if we're consuming the preamble, we don't emit the warning. > > > > > AFAICT this is the way to tell that the preamble is terminated, since > > > > > the current file ID being processed is the predefines file and the > > > > > file is now terminated as of this method call (since !isEndOfMacro && > > > > > !(CurLexer && CurLexer->Is_PragmaLexer)). > > > > > `!this->PPOpts->GeneratePreamble` makes sure that if we're generating > > > > > the preamble, we don't emit the warning. We need to special case both > > > > > cases otherwise we get an error when generating the preamble or when > > > > > we load the preamble before even processing the rest of the main file. > > > > > > > > > > Does that makes sense? > > > > > !(CurLexer && CurLexer->getFileID() == PredefinesFileID) makes sure > > > > > that if we're consuming the preamble, we don't emit the warning > > > > > > > > 1) if we're consuming the preamble, what exactly is the sequence of > > > > events that would otherwise lead us to emit the warning? > > > > 2) what if we're in the predefines file for some other reason instead? > > > > > > > > I'm hoping you'll explain to me what you think the predefines file is, > > > > and what its relationship to the preamble is, and so why this condition > > > > is correct :-) > > > > > > > > My understanding (which is pretty shaky!) is that the predefines file > > > > is a kind of catchall used to inject text into the preprocessor that > > > > doesn't appear in the source file - that it contains definitions of > > > > builtin integer types, macros defined by `-D` on the command line, and > > > > so on. If it has any relationship to the preamble, it's something > > > > subtle that's to do with the relative order in which the preprocessor > > > > sees entities like the predefines, the preprocesor, and the main file. > > > > So according to my understanding, interpreting "reaching EOF in the > > > > predefines file" as "consuming a preamble" is either wrong or something > > > > very subtle requiring a significant comment. > > > > > > > > The code you refer to below is doing something very different: it's > > > > saying that if we have a preamble, then reaching the end of the > > > > predefines file is the *trigger* to inject state from it! > > > > > > > > > !this->PPOpts->GeneratePreamble makes sure that if we're generating > > > > > the preamble, we don't emit the warning > > > > > > > > Sure, but that doesn't sound correct at all! A preamble mostly consists > > > > of parsing a lot of headers, and if any of those headers have an > > > > unpaired pragma, we should be warning on that at EOF. It's only if we > > > > hit the "pretend" EOF from the truncated main file that we want to > > > > suppress the warning. > > > > the preprocessor sees entities like the predefines, the preprocesor, > > > > and the main file. > > > Oops... sees the predefines, any injected events from the preamble, and > > > the main file. > > > > > I think that definition is correct, but given that there is only one > > predefines file, when it ends, it must go into the main file, no? And if we > > have a pragma assume nonnull loc, it shouldn't be from the clang builtins, > > it should be from the preamble (I'd certainly hope!) > > > > re: headers, I think there might be a misunderstanding, as soon as an > > #include is seen the pragma is ended here: > > https://github.com/llvm/llvm-project/blob/7631c366c8589dda488cb7ff1df26cc134002208/clang/lib/Lex/PPDirectives.cpp#L2006, > > so if PragmaAssumeNonNullLoc is valid here we haven't had any includes > > since the pragma was seen. > > I think that definition is correct, but given that there is only one > > predefines file, when it ends, it must go into the main file, no? > > I'm sorry, but i really can't follow what you're asking, or how it relates to > the original problem, or what the problem even is. If *you're* not sure, it's > probably worth tracing execution of the preprocessor and describing the > precise sequence of events that cause a problem. And then trying to come up > with a *direct* way of solving them. > > Here's my *guess* as to what's going on (but please check!): > - the assume_nonnull state is very sensitive to shifts between files - it's > supposed to be off before reaching EOF and before #include. (This is why it > can appear to be file-local syntax but be implemented as a global flag). > - therefore the AssumeNonnullLoc is not just some property of the > preprocessor, it's a transient state that should be set at precisely the > right point in the program. However you're setting it at the very beginning, > as a side effect of ReadASTBlock(). > - when consuming a preamble, the preprocessor simulates a structure like: > ``` > // AssumeNonnullLoc is being set here > #include <predefines> // implicit > #inject pp conditional stack > // AssumeNonnnullLoc *should* be set here > // now continue with the post-preamble part of main file > ``` > - because AssumeNonnullLoc is set too early, the PP events in between these > locations mess with it. Fortunately that is probably only the predefines, > although that's a brittle assumption to make. And we don't run into problems > when entering the predefines file, because `HandleHeaderIncludeOrImport` > isn't called, we just enter the file directly. However you do run into issues > when exiting that file again, and so a special case works around those. > > If this is right, it's not a good solution. There are other observable > effects of setting AssumeNonnullLoc in the wrong place. All the predefines > will be parsed in "assume nonnull mode". And if any files are #included from > the predefines file, then the enter/exit events will cause problems. > Predefines may contain `#include` with the `-include` and `-imacros` clang > flags, and due to clangd's preamble patching. > > In this case the right fix is to set AssumeNonnullLoc at the correct point in > the file: triggered by exiting the predefines, along with the PP conditional > stack. The way to do this is probably to have a second property on > preprocessor which stores the loc that will be restored when the trigger > occurs. (Similar to how PP distinguishes between the *current* conditional > stack and the replayable one set by ASTReader when it calls > `PP.setReplayablePreambleConditionalStack(ConditionalStack, SkipInfo);` > > > re: headers, I think there might be a misunderstanding, as soon as an > > #include is seen the pragma is ended here > > The scenario is: > > We are building a preamble from main.c > main.c: > ``` > #include "foo.h" > int x; > ``` > foo.h: > ``` > #pragma clang assume_nonnull begin > void foo(int*); > // missing end pragma > ``` > We should issue a diagnostic when we hit the end of foo.h, but with the code > you have here it will be supposed instead. (There's no #include in scope of > the pragma here, so I don't think that's relevant) 🤦 Thanks, that example is super helpful, not sure why I wasn't able to understand it earlier. Yeah, this doesn't properly handle EOF for headers during preamble generation and we should only set the assume_nonnull loc only when transitioning, not earlier. Like you said, we need to basically mirror the replay/record logic for the preamble conditional stack for it to work properly. I went ahead and implemented the approach that you mentioned to restore the value when exiting from the predefines as well as added a test to clangd for the edge case you gave. PTAL, I kept the storage separate but I wonder if it would be better to merge in with the existing replay logic. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D122179/new/ https://reviews.llvm.org/D122179 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits