kadircet added a comment. In D80900#2066327 <https://reviews.llvm.org/D80900#2066327>, @sammccall wrote:
> TL;DR: I think there are three viable paths that we should consider: > A) minimal change: put the FSProvider in ParseInputs > B) this patch, but with more documentation and safety > C) fight to clean up VFS multithreading semantics > > Assuming we don't want to block on C, I'm not sure whether A or B is > conceptually better. But A is a smaller change and brings side benefits, so > wondering if you see advantages in B. I didn't want to go for A) as I was afraid of satisfying the contract on `FileSystemProvider::getFileSystem` /// Context::current() will be the context passed to the clang entrypoint, /// such as addDocument(), and will also be propagated to result callbacks. As this patch keep the `FileSystemProvider` inside TUScheduler.cpp and we need to ensure context is always the one we received in the entrypoint. Whereas `ParseInputs` is widely passed around, and ensuring that assumption holds might be hard. > You're right about the race condition of course and I'm sure this patch is > correct. > But this rubs me the wrong way a bit - the API change doesn't seem clearly > motivated either from a high level (we conceptually *want* a single FS to > build a single version of code) or from a low level (what are we actually > doing in parallel that isn't threadsafe?). > > Looking at the low level perspective first. VFS has three classes > threadsafety issues: > > - it contains the "working directory" as mutable state, which isn't > multi-threading friendly. We do set working directory concurrently with other > work, but always to the same thing. We could factor our way around this by > setting the working directory prior to any concurrency (to CDB.Directory) and > ensuring we never set it again. > - the RealFilesystem in real-working-directory mode is a singleton so its use > is thread-hostile. This isn't really relevant to us because we use > emulated-working-directory mode and have distinct instances. But it probably > informed VFS's lack of useful threading semantics. > - read operations aren't `const` or documented as threadsafe even though the > canonical `RealFilesystem` is. In-tree implementations seem to be threadsafe, > some out-of-tree ones are not :-( This definitely bites us here. > > I think it would be fundamentally possible to use a single VFS here. We > need to modify the VFS interface to make concurrent reads safe (and const, > while here) and fix any implementations. This isn't trivial. I think it's > worthwhile, but experience tells me different projects want different things > from the VFS interface... I totally agree, this was actually my initial motivation. But I wasn't sure if it is worth the hassle, as this might receive some push-back from downstream users (VFS interface is not just in clang, but in llvm) and coming to a consensus is likely to take time. It would be nice to clear that technical debt, but I don't think it provides any other practical benefits. > The high-level view: IMO FSProvider is basically a hack around VFS's > thread-safety issues, rather than a fundamental part of the application - > it's a filesystem that doesn't have an attached working directory and whose > read operations (e.g. `getFileSystem()->status()`) are threadsafe. We could > rename FSProvider -> ThreadsafeFS and getFileSystem() -> newCursor(WD) or > something, and maybe we should. > From this perspective, this patch is changing two things that have little to > do with each other: it's replacing FS -> ThreadsafeFS and it's moving the FS > ownership from one revision of the file to the TUScheduler itself. > > I'm not really sure whether the latter is a good or a bad idea. If we want to > do it, the use of ParseInputs as the interface to TUScheduler is a bit > confusing - we should at least document that FS isn't used (as we do for > CompileCommand) and probably assert that it's null, too. Right, the latter was a consequence of not storing FSProvider in ParseInputs, rather than a deliberate one. We can also push it around as a parameter to update, rather than a member if that looks more clear. I didn't do that as it required more plumbing and change was actually local to TUScheduler. > But maybe simpler would be to "just" change the VFS member in ParseInputs to > a `const FSProvider*`. This keeps the design approximately what it is now, > but simplifies the ParseInputs concept: it's now truly readonly data, and > functions that consume it won't have effects that escape. Isn't this just what you proposed in A) ? My concern above applies to this one too. It is not a practical concern as of today, as most uses of ParseInputs is through callbacks in ASTWorker, hence the context should be implicitly set. Unless, someone attaches another async callback. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80900/new/ https://reviews.llvm.org/D80900 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits