sammccall added inline comments.
================
Comment at: clangd/Symbol.h:37
+// The class presents a C++ symbol, e.g. class, function.
+struct Symbol {
+ // The symbol identifier, using USR.
----------------
malaperle wrote:
> sammccall wrote:
> > malaperle wrote:
> > > malaperle wrote:
> > > > sammccall wrote:
> > > > > hokein wrote:
> > > > > > malaperle wrote:
> > > > > > > sammccall wrote:
> > > > > > > > hokein wrote:
> > > > > > > > > malaperle wrote:
> > > > > > > > > > I think it would be nice to have methods as an interface to
> > > > > > > > > > get this data instead of storing them directly. So that an
> > > > > > > > > > index-on-disk could go fetch the data. Especially the
> > > > > > > > > > occurrences which can take a lot of memory (I'm working on
> > > > > > > > > > a branch that does that). But perhaps defining that
> > > > > > > > > > interface is not within the scope of this patch and could
> > > > > > > > > > be better discussed in D40548 ?
> > > > > > > > > I agree. We can't load all the symbol occurrences into the
> > > > > > > > > memory since they are too large. We need to design interface
> > > > > > > > > for the symbol occurrences.
> > > > > > > > >
> > > > > > > > > We could discuss the interface here, but CodeCompletion is
> > > > > > > > > the main thing which this patch focuses on.
> > > > > > > > > We can't load all the symbol occurrences into the memory
> > > > > > > > > since they are too large
> > > > > > > >
> > > > > > > > I've heard this often, but never backed up by data :-)
> > > > > > > >
> > > > > > > > Naively an array of references for a symbol could be doc ID +
> > > > > > > > offset + length, let's say 16 bytes.
> > > > > > > >
> > > > > > > > If a source file consisted entirely of references to
> > > > > > > > 1-character symbols separated by punctuation (1 reference per 2
> > > > > > > > bytes) then the total size of these references would be 8x the
> > > > > > > > size of the source file - in practice much less. That's not
> > > > > > > > very big.
> > > > > > > >
> > > > > > > > (Maybe there are edge cases with macros/templates, but we can
> > > > > > > > keep them under control)
> > > > > > > I'd have to break down how much memory it used by what, I'll come
> > > > > > > back to you on that. Indexing llvm with ClangdIndexDataStorage,
> > > > > > > which is pretty packed is about 200MB. That's already a lot
> > > > > > > considering we want to index code bases many times bigger. But
> > > > > > > I'll try to come up with more precise numbers. I'm open to
> > > > > > > different strategies.
> > > > > > >
> > > > > > I can see two points here:
> > > > > >
> > > > > > 1. For all symbol occurrences of a TU, it is not quite large, and
> > > > > > we can keep them in memory.
> > > > > > 2. For all symbol occurrences of a whole project, it's not a good
> > > > > > idea to load all of them into memory (For LLVM project, the size of
> > > > > > YAML dataset is ~1.2G).
> > > > > (This is still a sidebar - not asking for any changes)
> > > > >
> > > > > The YAML dataset is not a good proxy for how big the data is (at
> > > > > least without an effort to estimate constant factor).
> > > > > And "it's not a good idea" isn't an assertion that can hold without
> > > > > reasons, assumptions, and data.
> > > > > If the size turns out to be, say, 120MB for LLVM, and we want to
> > > > > scale to 10x that, and we're spending 500MB/file for ASTs, then it
> > > > > might well be a good trade.
> > > > > The YAML dataset is not a good proxy for how big the data is (at
> > > > > least without an effort to estimate constant factor).
> > > >
> > > > Indeed. I'll try to come up with more realistic numbers. There are
> > > > other things not accounted for in the 16 bytes mentioned above, like
> > > > storing roles and relations.
> > > >
> > > > > 500MB/file for ASTs
> > > >
> > > > What do you mean? 500MB worth of occurrences per file? Or Preambles
> > > > perhaps?
> > > > What do you mean? 500MB worth of occurrences per file? Or Preambles
> > > > perhaps?
> > >
> > > Oh I see, the AST must be in memory for fast reparse. I just tried
> > > opening 3 files at the same time I it was already around 500MB. Hmm,
> > > that's a bit alarming.
> > Right, just that we have to consider RAM usage for the index in the context
> > of clangd's overall requirements - if other parts of clangd use 1G of ram
> > for typical work on a large project, then we shouldn't rule out spending a
> > couple of 100MB on the index if it adds a lot of value.
> Agreed we have to consider the overall requirements. I think over 1GB of RAM
> is not good for our use case, whether is comes from the AST or index. I think
> it's perfectly normal if we have different requirements but we can see
> discuss how to design things so there are options to use either more RAM or
> disk space. It seems the AST would be the most important factor for now so
> perhaps it's something we should start investigating/discussing.
Agree, this is another thing we can discuss tonight.
================
Comment at: clangd/index/Index.h:52
+private:
+ friend class llvm::DenseMapInfo<clang::clangd::SymbolID>;
+
----------------
ioeric wrote:
> warning: class 'DenseMapInfo' was previously declared as a struct
> [-Wmismatched-tags]
Fixed in rCTE320554
Repository:
rCTE Clang Tools Extra
https://reviews.llvm.org/D40897
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits