dgoldman added inline comments.

================
Comment at: clang/lib/Lex/PPLexerChange.cpp:436
+  if (PragmaAssumeNonNullLoc.isValid() && !this->PPOpts->GeneratePreamble &&
+      !(CurLexer && CurLexer->getFileID() == PredefinesFileID) &&
       !isEndOfMacro && !(CurLexer && CurLexer->Is_PragmaLexer)) {
----------------
sammccall wrote:
> dgoldman wrote:
> > sammccall wrote:
> > > sammccall wrote:
> > > > dgoldman wrote:
> > > > > sammccall wrote:
> > > > > > dgoldman wrote:
> > > > > > > sammccall wrote:
> > > > > > > > what's the PredefinesFileID special case about?
> > > > > > > See ExitedFromPredefinesFile below, this is how I check if we're 
> > > > > > > falling off the end of the preamble region into the main file 
> > > > > > > (same as done for the conditional stack), LMK if there's a better 
> > > > > > > way.
> > > > > > I don't understand how this checks if we're falling off the 
> > > > > > preamble region, and the code around ExitedFromPredefinesFile 
> > > > > > doesn't clarify this for me. Can you explain?
> > > > > > 
> > > > > > The `if (ExitedFromPredefinesFile)` appears to be handling the 
> > > > > > logical *insertion* of preamble PP events when *consuming* a 
> > > > > > preamble, which is not what you're trying to do here.
> > > > > > 
> > > > > > The condition here is of the form "if we have an open pragma, and 
> > > > > > we're not generating a preamble, and some other stuff, then 
> > > > > > diagnose". So if the baseline case is "diagnose" and the preamble 
> > > > > > case is an exception, the "other stuff" clauses don't limit the 
> > > > > > preamble exception, they add extra exceptions!
> > > > > `!(CurLexer && CurLexer->getFileID() == PredefinesFileID)` makes sure 
> > > > > that if we're consuming the preamble, we don't emit the warning. 
> > > > > AFAICT this is the way to tell that the preamble is terminated, since 
> > > > > the current file ID being processed is the predefines file and the 
> > > > > file is now terminated as of this method call (since !isEndOfMacro && 
> > > > > !(CurLexer && CurLexer->Is_PragmaLexer)). 
> > > > > `!this->PPOpts->GeneratePreamble` makes sure that if we're generating 
> > > > > the preamble, we don't emit the warning. We need to special case both 
> > > > > cases otherwise we get an error when generating the preamble or when 
> > > > > we load the preamble before even processing the rest of the main file.
> > > > > 
> > > > > Does that makes sense? 
> > > > > !(CurLexer && CurLexer->getFileID() == PredefinesFileID) makes sure 
> > > > > that if we're consuming the preamble, we don't emit the warning
> > > > 
> > > > 1) if we're consuming the preamble, what exactly is the sequence of 
> > > > events that would otherwise lead us to emit the warning?
> > > > 2) what if we're in the predefines file for some other reason instead?
> > > > 
> > > > I'm hoping you'll explain to me what you think the predefines file is, 
> > > > and what its relationship to the preamble is, and so why this condition 
> > > > is correct :-)
> > > > 
> > > > My understanding (which is pretty shaky!) is that the predefines file 
> > > > is a kind of catchall used to inject text into the preprocessor that 
> > > > doesn't appear in the source file - that it contains definitions of 
> > > > builtin integer types, macros defined by `-D` on the command line, and 
> > > > so on. If it has any relationship to the preamble, it's something 
> > > > subtle that's to do with the relative order in which the preprocessor 
> > > > sees entities like the predefines, the preprocesor, and the main file.
> > > > So according to my understanding, interpreting "reaching EOF in the 
> > > > predefines file" as "consuming a preamble" is either wrong or something 
> > > > very subtle requiring a significant comment.
> > > > 
> > > > The code you refer to below is doing something very different: it's 
> > > > saying that if we have a preamble, then reaching the end of the 
> > > > predefines file is the *trigger* to inject state from it!
> > > > 
> > > > > !this->PPOpts->GeneratePreamble makes sure that if we're generating 
> > > > > the preamble, we don't emit the warning
> > > > 
> > > > Sure, but that doesn't sound correct at all! A preamble mostly consists 
> > > > of parsing a lot of headers, and if any of those headers have an 
> > > > unpaired pragma, we should be warning on that at EOF. It's only if we 
> > > > hit the "pretend" EOF from the truncated main file that we want to 
> > > > suppress the warning.
> > > > the preprocessor sees entities like the predefines, the preprocesor, 
> > > > and the main file.
> > > Oops... sees the predefines, any injected events from the preamble, and 
> > > the main file.
> > > 
> > I think that definition is correct, but given that there is only one 
> > predefines file, when it ends, it must go into the main file, no? And if we 
> > have a pragma assume nonnull loc, it shouldn't be from the clang builtins, 
> > it should be from the preamble (I'd certainly hope!)
> > 
> > re: headers, I think there might be a misunderstanding, as soon as an 
> > #include is seen the pragma is ended here: 
> > https://github.com/llvm/llvm-project/blob/7631c366c8589dda488cb7ff1df26cc134002208/clang/lib/Lex/PPDirectives.cpp#L2006,
> >  so if PragmaAssumeNonNullLoc is valid here we haven't had any includes 
> > since the pragma was seen.
> > I think that definition is correct, but given that there is only one 
> > predefines file, when it ends, it must go into the main file, no?
> 
> I'm sorry, but i really can't follow what you're asking, or how it relates to 
> the original problem, or what the problem even is. If *you're* not sure, it's 
> probably worth tracing execution of the preprocessor and describing the 
> precise sequence of events that cause a problem. And then trying to come up 
> with a *direct* way of solving them.
> 
> Here's my *guess* as to what's going on (but please check!):
>  - the assume_nonnull state is very sensitive to shifts between files - it's 
> supposed to be off before reaching EOF and before #include. (This is why it 
> can appear to be file-local syntax but be implemented as a global flag).
>  - therefore the AssumeNonnullLoc is not just some property of the 
> preprocessor, it's a transient state that should be set at precisely the 
> right point in the program. However you're setting it at the very beginning, 
> as a side effect of ReadASTBlock().
>  - when consuming a preamble, the preprocessor simulates a structure like:
> ```
> // AssumeNonnullLoc is being set here
> #include <predefines> // implicit
> #inject pp conditional stack
> // AssumeNonnnullLoc *should* be set here
> // now continue with the post-preamble part of main file
> ```
>  - because AssumeNonnullLoc is set too early, the PP events in between these 
> locations mess with it. Fortunately that is probably only the predefines, 
> although that's a brittle assumption to make. And we don't run into problems 
> when entering the predefines file, because `HandleHeaderIncludeOrImport` 
> isn't called, we just enter the file directly. However you do run into issues 
> when exiting that file again, and so a special case works around those.
> 
> If this is right, it's not a good solution. There are other observable 
> effects of setting AssumeNonnullLoc in the wrong place. All the predefines 
> will be parsed in "assume nonnull mode". And if any files are #included from 
> the predefines file, then the enter/exit events will cause problems. 
> Predefines may contain `#include` with the `-include` and `-imacros` clang 
> flags, and due to clangd's preamble patching.
> 
> In this case the right fix is to set AssumeNonnullLoc at the correct point in 
> the file: triggered by exiting the predefines, along with the PP conditional 
> stack. The way to do this is probably to have a second property on 
> preprocessor which stores the loc that will be restored when the trigger 
> occurs. (Similar to how PP distinguishes between the *current* conditional 
> stack and the replayable one set by ASTReader when it calls 
> `PP.setReplayablePreambleConditionalStack(ConditionalStack, SkipInfo);`
> 
> > re: headers, I think there might be a misunderstanding, as soon as an 
> > #include is seen the pragma is ended here
> 
> The scenario is:
> 
> We are building a preamble from main.c
> main.c:
> ```
> #include "foo.h"
> int x;
> ```
> foo.h:
> ```
>  #pragma clang assume_nonnull begin
> void foo(int*);
> // missing end pragma
> ```
> We should issue a diagnostic when we hit the end of foo.h, but with the code 
> you have here it will be supposed instead. (There's no #include in scope of 
> the pragma here, so I don't think that's relevant)
🤦 Thanks, that example is super helpful, not sure why I wasn't able to 
understand it earlier. Yeah, this doesn't properly handle EOF for headers 
during preamble generation and we should only set the assume_nonnull loc only 
when transitioning, not earlier. Like you said, we need to basically mirror the 
replay/record logic for the preamble conditional stack for it to work properly. 
I went ahead and implemented the approach that you mentioned to restore the 
value when exiting from the predefines as well as added a test to clangd for 
the edge case you gave. PTAL, I kept the storage separate but I wonder if it 
would be better to merge in with the existing replay logic.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122179/new/

https://reviews.llvm.org/D122179

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to