kadircet added a comment.

In D80900#2066327 <https://reviews.llvm.org/D80900#2066327>, @sammccall wrote:

> TL;DR: I think there are three viable paths that we should consider:
>  A) minimal change: put the FSProvider in ParseInputs
>  B) this patch, but with more documentation and safety
>  C) fight to clean up VFS multithreading semantics
>
> Assuming we don't want to block on C, I'm not sure whether A or B is 
> conceptually better. But A is a smaller change and brings side benefits, so 
> wondering if you see advantages in B.


I didn't want to go for A) as I was afraid of satisfying the contract on 
`FileSystemProvider::getFileSystem`

  /// Context::current() will be the context passed to the clang entrypoint,
  /// such as addDocument(), and will also be propagated to result callbacks.

As this patch keep the `FileSystemProvider` inside TUScheduler.cpp and we need 
to ensure context is always the one we received in the entrypoint. Whereas 
`ParseInputs` is widely
passed around, and ensuring that assumption holds might be hard.

> You're right about the race condition of course and I'm sure this patch is 
> correct.
>  But this rubs me the wrong way a bit - the API change doesn't seem clearly 
> motivated either from a high level (we conceptually *want* a single FS to 
> build a single version of code) or from a low level (what are we actually 
> doing in parallel that isn't threadsafe?).
> 
> Looking at the low level perspective first. VFS has three classes 
> threadsafety issues:
> 
> - it contains the "working directory" as mutable state, which isn't 
> multi-threading friendly. We do set working directory concurrently with other 
> work, but always to the same thing. We could factor our way around this by 
> setting the working directory prior to any concurrency (to CDB.Directory) and 
> ensuring we never set it again.
> - the RealFilesystem in real-working-directory mode is a singleton so its use 
> is thread-hostile. This isn't really relevant to us because we use 
> emulated-working-directory mode and have distinct instances. But it probably 
> informed VFS's lack of useful threading semantics.
> - read operations aren't `const` or documented as threadsafe even though the 
> canonical `RealFilesystem` is. In-tree implementations seem to be threadsafe, 
> some out-of-tree ones are not :-( This definitely bites us here.
> 
>   I think it would be fundamentally possible to use a single VFS here. We 
> need to modify the VFS interface to make concurrent reads safe (and const, 
> while here) and fix any implementations. This isn't trivial. I think it's 
> worthwhile, but experience tells me different projects want different things 
> from the VFS interface...

I totally agree, this was actually my initial motivation. But I wasn't sure if 
it is worth the hassle, as this might receive some push-back from downstream 
users (VFS interface is not just in clang, but in llvm) and coming to a 
consensus is likely to take time.
It would be nice to clear that technical debt, but I don't think it provides 
any other practical benefits.

> The high-level view: IMO FSProvider is basically a hack around VFS's 
> thread-safety issues, rather than a fundamental part of the application - 
> it's a filesystem that doesn't have an attached working directory and whose 
> read operations (e.g. `getFileSystem()->status()`) are threadsafe. We could 
> rename FSProvider -> ThreadsafeFS and getFileSystem() -> newCursor(WD) or 
> something, and maybe we should. 
>  From this perspective, this patch is changing two things that have little to 
> do with each other: it's replacing FS -> ThreadsafeFS and it's moving the FS 
> ownership from one revision of the file to the TUScheduler itself.
> 
> I'm not really sure whether the latter is a good or a bad idea. If we want to 
> do it, the use of ParseInputs as the interface to TUScheduler is a bit 
> confusing - we should at least document that FS isn't used (as we do for 
> CompileCommand) and probably assert that it's null, too.

Right, the latter was a consequence of not storing FSProvider in ParseInputs, 
rather than a deliberate one. We can also push it around as a parameter to 
update, rather than a member if that looks more clear.
I didn't do that as it required more plumbing and change was actually local to 
TUScheduler.

> But maybe simpler would be to "just" change the VFS member in ParseInputs to 
> a `const FSProvider*`. This keeps the design approximately what it is now, 
> but simplifies the ParseInputs concept: it's now truly readonly data, and 
> functions that consume it won't have effects that escape.

Isn't this just what you proposed in A) ? My concern above applies to this one 
too. It is not a practical concern as of today, as most uses of ParseInputs is 
through callbacks in ASTWorker, hence the context should be implicitly set. 
Unless, someone attaches  another async callback.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80900/new/

https://reviews.llvm.org/D80900



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to