sammccall added a reviewer: kadircet.
sammccall added a comment.

Thanks! This is a really good idea, but not without its risks :-) Sorry for not 
getting to this for a while!

My only *really* high-level design question is: I wonder what the tradeoffs are 
in keeping just the preamble vs the whole ASTWorker. I'd *expect* this approach 
gets almost all the benefit without complicating the basic assumptions around 
ASTWorker, so it's probably the right way to go. Haven't thought about it much 
though...

---

I've got a bit of cognitive load because there's a few preamble-related ideas 
we've had, and I'm trying to work out how they interact. Let's start by writing 
them down...

1. a *persistent* cache so closing+reopening clangd loses less state. (This is 
complicated because only the PCH is easily serializable, the rest of the 
PreambleData struct isn't)
2. building caches of preambles while background-indexing (this would be great 
for modules but is probably way too big for whole preambles)
3. reusing the "wrong" preamble initially when you open a new file, to give 
some basic functionality (using existing preamble patching logic, just in a 
more aggressive scenario)
4. having the disk-based storage unlink the file preemptively, to eliminate any 
chance of leaking the *.pch

1&2 are both big projects, so I'm not really worried about the overlap or at 
least not going to try to predict how it'd be resolved.

3 is actually made *easier* by this patch I think: having a central repository 
of preambles makes it more natural to grab some "other" preamble.

4 I think doesn't really interact at all: the PreambleData object would own a 
file descriptor instead of a file, but it's all much the same.

So I think in summary there's not really any conflict to resolve with these 
ideas. cc @kadircet though who's done more thinking about #1 and #3 than I have.

---

I think we need to be fairly careful about policies around this cache. 
Preambles are large (we frequently see several hundred MB). Some workflows 
involve opening many files at a time. Some workflows end up running multiple 
copies of clangd on the same files. Some configurations keep them in memory 
rather than on disk. So a too-large cache could waste quite a lot of resources.

So, some pointy questions:

- landing so close to the 12 release, should we conservatively default this to 
0, and require opt-in?
- is MB or #preambles a better limit to the cache size?
- should we take size into account when deciding what to evict? (my guess is 
no, cost scales with size, and value scales with size * probability of reuse, 
so we should purely optimize for probability of reuse)
- can we do better than LRU? The cache is accessed so infrequently and misses 
are so horrendously expensive that we could certainly affort to e.g. track 
usage history of every file ever opened, if it would help performance and not 
add too much complexity.
- not a question, but I can say for sure that 1000 with no size limit isn't a 
safe default for disk :-(



================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:193
+  /// Get the preamble associated with a \p Key, removing
+  /// it from the cache
+  std::shared_ptr<const PreambleData> take(llvm::StringRef Key) {
----------------
we might instead consider keeping "active" preambles in the cache, and simply 
considering their cost to be 0/ineligible for eviction if the 
shared_ptr::usage_count > 1. 
This allows this cache to be a "registry" so we can try using a preamble from a 
different TU as mentioned above.

(but this could also be done later, or could be done with a separate table of 
weak_ptrs. No change needed for this patch, just thinking out loud)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93873/new/

https://reviews.llvm.org/D93873

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to