jansvoboda11 added a subscriber: rsmith. jansvoboda11 added inline comments.
================ Comment at: clang/lib/Serialization/ASTReader.cpp:6343 "Invalid data, missing pragma diagnostic states"); - SourceLocation Loc = ReadSourceLocation(F, Record[Idx++]); - auto IDAndOffset = SourceMgr.getDecomposedLoc(Loc); - assert(IDAndOffset.first.isValid() && "invalid FileID for transition"); - assert(IDAndOffset.second == 0 && "not a start location for a FileID"); + FileID FID = ReadFileID(F, Record, Idx); + assert(FID.isValid() && "invalid FileID for transition"); ---------------- dexonsmith wrote: > alexfh wrote: > > alexfh wrote: > > > dexonsmith wrote: > > > > alexfh wrote: > > > > > alexfh wrote: > > > > > > jansvoboda11 wrote: > > > > > > > alexfh wrote: > > > > > > > > dexonsmith wrote: > > > > > > > > > eaeltsin wrote: > > > > > > > > > > This doesn't work as before, likely because ReadFileID > > > > > > > > > > doesn't do TranslateSourceLocation. > > > > > > > > > > > > > > > > > > > > Our tests fail. > > > > > > > > > > > > > > > > > > > > I tried calling TranslateSourceLocation here and the tests > > > > > > > > > > passed: > > > > > > > > > > ``` > > > > > > > > > > SourceLocation Loc = > > > > > > > > > > Diag.SourceMgr->getComposedLoc(FID, 0); > > > > > > > > > > SourceLocation Loc2 = TranslateSourceLocation(F, Loc); > > > > > > > > > > auto IDAndOffset = SourceMgr.getDecomposedLoc(Loc2); > > > > > > > > > > > > > > > > > > > > // Note that we don't need to set up > > > > > > > > > > Parent/ParentOffset here, because > > > > > > > > > > // we won't be changing the diagnostic state within > > > > > > > > > > imported FileIDs > > > > > > > > > > // (other than perhaps appending to the main source > > > > > > > > > > file, which has no > > > > > > > > > > // parent). > > > > > > > > > > auto &F = > > > > > > > > > > Diag.DiagStatesByLoc.Files[IDAndOffset.first]; > > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > > > Sorry I don't know the codebase, so this fix is definitely > > > > > > > > > > ugly :) But it shows the problem. > > > > > > > > > > > > > > > > > > > I don't think that's the issue, since `ReadFileID()` calls > > > > > > > > > `TranslateFileID`, which should seems like it should be > > > > > > > > > equivalent. > > > > > > > > > > > > > > > > > > However, I notice that the post-increment for `Idx` got > > > > > > > > > dropped! Can you try replacing the line of code with the > > > > > > > > > following and see if that fixes your tests (without any extra > > > > > > > > > TranslateSourceLocation logic)? > > > > > > > > > ``` > > > > > > > > > lang=c++ > > > > > > > > > FileID FID = ReadFileID(F, Record, Idx++); > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > If so, maybe you can contribute that fix with a reduced > > > > > > > > > testcase? I suggest adding me, @vsapsai, @Bigcheese, and > > > > > > > > > @jansvoboda11 as reviewers. > > > > > > > > > > > > > > > > > > @alexfh, maybe you can check if this fixes your tests as well? > > > > > > > > > > > > > > > > > > (If this is the issue, it's a bit surprising we don't have > > > > > > > > > existing tests covering this case... and embarrassing I > > > > > > > > > missed it when reviewing initially!) > > > > > > > > I've noticed the dropped `Idx` post-increment as well, but I > > > > > > > > went a step further and looked at the `ReadFileID` > > > > > > > > implementation, which is actually doing a post-increment > > > > > > > > itself, and accepts `Idx` by reference: > > > > > > > > ``` > > > > > > > > FileID ReadFileID(ModuleFile &F, const RecordDataImpl &Record, > > > > > > > > unsigned &Idx) const { > > > > > > > > return TranslateFileID(F, FileID::get(Record[Idx++])); > > > > > > > > } > > > > > > > > ``` > > > > > > > > > > > > > > > > Thus, it seems to be correct. But what @eaeltsin has found is > > > > > > > > actually a problem for us. I'm currently trying to make an > > > > > > > > isolated test case, but it's quite tricky (as header modules > > > > > > > > are =\). It may be the case that our build setup relies on > > > > > > > > something clang doesn't explicitly promises, but the fact is > > > > > > > > that the behavior (as observed by our build setup) has changed. > > > > > > > > I'll try to revert the commit for now to get us unblocked and > > > > > > > > provide a test case as soon as I can. > > > > > > > Thanks for helping out @dexonsmith, we did have the week off. > > > > > > > > > > > > > > @eaeltsin @alexfhDone, are you able to provide the failing test > > > > > > > case? I'm happy to look into it and re-land this with a fix. > > > > > > I've spent multiple hours trying to extract an observable test > > > > > > case. It turned out to be too hairy of a yaq to shave: for each > > > > > > compilation a separate sandboxed environment is created with a > > > > > > separate symlink tree with just the inputs necessary for that > > > > > > action. Some of the inputs are prebuilt module files (e.g. for > > > > > > libc++) that are version-locked with the compiler. So far @jgorbe > > > > > > and I could reduce this to four compilation steps with their own > > > > > > symlink trees with inputs. While I could figure out some of the > > > > > > factors that affect reproducibility (for example, symlinks are > > > > > > important, since making a deep copy of the input directories makes > > > > > > the issue disappear), it will take a few more hours of concentrated > > > > > > yak shaving to bring this to a shareable state. I'm not sure I have > > > > > > much more time to sink into investigating this. > > > > > > > > > > > > It seems like examining code based on @eaeltsin's finding may be a > > > > > > more fruitful path to synthesizing a regression test. Could you try > > > > > > following that path? > > > > > One more observation: `-fmodules-embed-all-files` and > > > > > `-Wno-mismatched-tags` compiler options turned out to be important. > > > > Maybe @eaeltsin can help, but I don't see any reason to think that > > > > testcase will be easier. Typically we don't revert without a testcase > > > > or at least some way to understand the problem and make progress. > > > > > > > > (Maybe @jansvoboda11 has ideas for extra instrumentation in the > > > > compiler to better understand what's going on with your setup?) > > > I've managed to get rid of the precompiled module files and now I have > > > something much more observable. It will take some more time to brush it > > > up though. > > The standalone repro is here: {F25463527} > > > > The files should be unpacked to a directory (e.g. /tmp/repro/standalone) > > and the repro.sh script can be invoked with CLANG environment variable > > pointing to the clang binary being tested. The script will create two > > symlink trees and start clang twice. The second clang invocation fails with > > clang containing this patch. > > > > ``` > > $ CLANG=/tmp/repro/clang-good /tmp/repro/standalone/repro.sh > > ++ dirname /tmp/repro/standalone/repro.sh > > + DIR=/tmp/repro/standalone > > + cd /tmp/repro/standalone > > + rm -rf 1 2 > > + mkdir -p 1/q/a_proto/mypkg 1/p/a_proto/mypkg 2/q/a_proto/mypkg > > 2/p/b_proto/mypkg > > + ln -sf /tmp/repro/standalone/files/a_proto.cppmap 1/ > > + ln -sf /tmp/repro/standalone/files/a.pb.h > > /tmp/repro/standalone/files/a.proto.h 1/p/a_proto/mypkg/ > > + ln -sf /tmp/repro/standalone/files/a.pb.h > > /tmp/repro/standalone/files/a.proto.h 1/q/a_proto/mypkg/ > > + ln -sf /tmp/repro/standalone/files/a_proto.cppmap > > /tmp/repro/standalone/files/b_proto.cppmap 2/ > > + ln -sf /tmp/repro/standalone/files/a.pb.h 2/q/a_proto/mypkg/ > > + ln -sf /tmp/repro/standalone/files/b.pb.h 2/p/b_proto/mypkg/ > > + CC_ARGS=('-Wno-mismatched-tags' '-Xclang=-fmodules-embed-all-files' > > '-Xclang=-fmodules-local-submodule-visibility' '-fmodules' > > '-fno-implicit-modules' '-fno-implicit-module-maps' '-std=gnu++17' > > '-Xclang=-fmodule-map-file-home-is-cwd') > > + export CC_ARGS > > + : /tmp > > + cd /tmp/repro/standalone/1 > > + /tmp/repro/clang-good -Wno-mismatched-tags > > -Xclang=-fmodules-embed-all-files > > -Xclang=-fmodules-local-submodule-visibility -fmodules > > -fno-implicit-modules -fno-implicit-module-maps -std=gnu++17 > > -Xclang=-fmodule-map-file-home-is-cwd -I q/a_proto -fmodule-name=//a > > -fmodule-map-file=a_proto.cppmap -xc++ -Xclang=-emit-module -c > > a_proto.cppmap -o /tmp/a.pcm > > + cd /tmp/repro/standalone/2 > > + /tmp/repro/clang-good -Wno-mismatched-tags > > -Xclang=-fmodules-embed-all-files > > -Xclang=-fmodules-local-submodule-visibility -fmodules > > -fno-implicit-modules -fno-implicit-module-maps -std=gnu++17 > > -Xclang=-fmodule-map-file-home-is-cwd -I q/b_proto -I q/a_proto > > -fmodule-name=//b -fmodule-map-file=b_proto.cppmap > > -Xclang=-fmodule-file=/tmp/a.pcm -xc++ -Xclang=-emit-module -c > > b_proto.cppmap -o /tmp/b.pcm > > > > $ CLANG=/tmp/repro/clang-bad /tmp/repro/standalone/repro.sh > > ++ dirname /tmp/repro/standalone/repro.sh > > + DIR=/tmp/repro/standalone > > + cd /tmp/repro/standalone > > + rm -rf 1 2 > > + mkdir -p 1/q/a_proto/mypkg 1/p/a_proto/mypkg 2/q/a_proto/mypkg > > 2/p/b_proto/mypkg > > + ln -sf /tmp/repro/standalone/files/a_proto.cppmap 1/ > > + ln -sf /tmp/repro/standalone/files/a.pb.h > > /tmp/repro/standalone/files/a.proto.h 1/p/a_proto/mypkg/ > > + ln -sf /tmp/repro/standalone/files/a.pb.h > > /tmp/repro/standalone/files/a.proto.h 1/q/a_proto/mypkg/ > > + ln -sf /tmp/repro/standalone/files/a_proto.cppmap > > /tmp/repro/standalone/files/b_proto.cppmap 2/ > > + ln -sf /tmp/repro/standalone/files/a.pb.h 2/q/a_proto/mypkg/ > > + ln -sf /tmp/repro/standalone/files/b.pb.h 2/p/b_proto/mypkg/ > > + CC_ARGS=('-Wno-mismatched-tags' '-Xclang=-fmodules-embed-all-files' > > '-Xclang=-fmodules-local-submodule-visibility' '-fmodules' > > '-fno-implicit-modules' '-fno-implicit-module-maps' '-std=gnu++17' > > '-Xclang=-fmodule-map-file-home-is-cwd') > > + export CC_ARGS > > + : /tmp > > + cd /tmp/repro/standalone/1 > > + /tmp/repro/clang-bad -Wno-mismatched-tags > > -Xclang=-fmodules-embed-all-files > > -Xclang=-fmodules-local-submodule-visibility -fmodules > > -fno-implicit-modules -fno-implicit-module-maps -std=gnu++17 > > -Xclang=-fmodule-map-file-home-is-cwd -I q/a_proto -fmodule-name=//a > > -fmodule-map-file=a_proto.cppmap -xc++ -Xclang=-emit-module -c > > a_proto.cppmap -o /tmp/a.pcm > > + cd /tmp/repro/standalone/2 > > + /tmp/repro/clang-bad -Wno-mismatched-tags > > -Xclang=-fmodules-embed-all-files > > -Xclang=-fmodules-local-submodule-visibility -fmodules > > -fno-implicit-modules -fno-implicit-module-maps -std=gnu++17 > > -Xclang=-fmodule-map-file-home-is-cwd -I q/b_proto -I q/a_proto > > -fmodule-name=//b -fmodule-map-file=b_proto.cppmap > > -Xclang=-fmodule-file=/tmp/a.pcm -xc++ -Xclang=-emit-module -c > > b_proto.cppmap -o /tmp/b.pcm > > While building module '//b': > > In file included from <module-includes>:1: > > In file included from ./p/b_proto/mypkg/b.pb.h:3: > > q/a_proto/mypkg/a.pb.h:3:10: fatal error: 'mypkg/a.proto.h' file not found > > #include "mypkg/a.proto.h" > > ^~~~~~~~~~~~~~~~~ > > 1 error generated. > > ``` > Great! Thank you for the reproducer! I (finally) got around to looking into this properly. What's happening here is that Clang deserializes `a.pcm` when compiling `b.pcm`. Prior to this patch, `ASTReader` would see the pragma diagnostic mapping for `"mypkg/a.proto.h"` and call `SourceManager::getDecomposedLoc()`. This ends up calling `ASTReader::getInputFile()`, which registers `"mypkg/a.proto.h"` in the `FileManager` (since `-fmodules-embed-all-files` makes it "transient"), making it visible to `HeaderSearch`. With this patch, we just read the `FileID` out of `a.pcm` without doing anything else. Since no other place in the compiler takes care to call `ASTReader::getInputFile()`, `"mypkg/a.proto.h"` doesn't get registered in `FileManager`. It's not reachable from any of the `-I` directories either, so the header search fails. Seems like Clang prior to this patch compiles your repro by accident. Another way to confirm this is removing the `#pragma clang diagnostic` lines in `"mypkg/a.proto.h"`. That makes the compilation to fail even prior to this patch. @rsmith Can you confirm that `-fmodules-embed-all-files` that you added in rG919ce235 is only supposed to make PCMs usable without requiring their input files on disk? (As opposed to making those input files available to the importer.) @alexfh & @eaeltsin: Could you try passing the appropriate include directory for `"mypkg/a.proto.h"` to the compilation of `b.pcm` and confirm this makes the problem go away with this patch applied? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D137213/new/ https://reviews.llvm.org/D137213 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits