kadircet marked 2 inline comments as done.
kadircet added a comment.

It has been a long time since I've proposed that change and even I forgot some 
of the high level details. Therefore, I wanted to sum up the state again so 
that we can decide on how to move forward.

Currently we have two different on-disk formats for index in clangd:

- Static index:
  - Merged single file, since merging is done before persistence, the data 
on-disk contains correct reference counts.
  - Produced by running SymbolCollector on each TU. Collector doesn't keep 
state between different TUs, merging is done on the caller-site.
  - No need to count references when loading, doesn't interact with 
`FileSymbols`.

- Background index:
  - Sharded multiple files.
  - Produced by running SymbolCollector on each TU, but instead of merging we 
separate those symbols into distinct files. Merging is done later on within 
`FileSymbols` which has no information about TUs.

FileSymbols is currently being used by BackgroundIndex and FileIndex(Dynamic 
Idx):

- In the case of BackgroundIndex, a symbol occurs at most twice(once in the 
header declaring it and once in the implementation file, if they are distinct), 
therefore when merging is done symbol references doesn't go above that.
- FileIndex doesn't perform de-duplication before-hand therefore it can perform 
reference counting while performing the merge. But currently it doesn't perform 
any merging.

As a result, changing `FileSymbols` would only effect `BackgroundIndex` and 
nothing else. Since merging occurs using the information in `RefSlabs` this can 
also be extended to benefit `FileIndex` and will most definitely be used
by any sort of indexing that performs merging over multiple files.

For unification purposes I would rather delete the `References` counting logic 
in `SymbolCollector` and perform it only at mergers(clangd-indexer and 
FileSymbols)



================
Comment at: clang-tools-extra/clangd/index/Background.h:113
 
+  // Note that FileSymbols counts References by incrementing it per each file
+  // mentioning the symbol, including headers. This contradicts with the
----------------
ilya-biryukov wrote:
> We should agree on and stick to a single behavior in both cases.
> I suggest keeping the current behavior for now (translation units). Why can't 
> we implement that?
Because `FileSymbols` only knows about file names and has no idea about whether 
a given file is TU or not. We might add some heuristics on the file extensions, 
or change SymbolCollector to count number of files(this would require 
maintaining a cache to de-duplicate references coming from the same files but 
originating from a different TU).


================
Comment at: clang-tools-extra/clangd/index/FileIndex.cpp:52
+          continue;
+        // If this is the first time we see references to a symbol, reset its
+        // Reference count, since filesymbols re-count number of references
----------------
ilya-biryukov wrote:
> The comment suggests there's something cheesy going on.
> Why would FileSymbols recompute the number of references? Can we avoid this 
> complicated logic?
This was the idea behind https://github.com/clangd/clangd/issues/23. Please see 
the main comment for a higher level discussion.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59481/new/

https://reviews.llvm.org/D59481



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to