kbobyrev added inline comments.
================ Comment at: clang-tools-extra/clangd/index/dex/Dex.cpp:251 Bytes += InvertedIndex.getMemorySize(); - for (const auto &P : InvertedIndex) - Bytes += P.second.bytes(); + for (const auto &TokenToPostingList : InvertedIndex) + Bytes += TokenToPostingList.first.Data.size() + ---------------- ioeric wrote: > kbobyrev wrote: > > ioeric wrote: > > > kbobyrev wrote: > > > > ioeric wrote: > > > > > Would `InvertedIndex.getMemorySize()` be a better estimate? > > > > IIUC this is only precise for the objects which do not make any > > > > allocations (e.g. `std::vector` and `std::string` memory estimate would > > > > not be "correct"). > > > > > > > > From the documentation > > > > > > > > > This is just the raw memory used by DenseMap. If entries are pointers > > > > > to objects, the size of the referenced objects are not included. > > > > > > > > I also have `Bytes += InvertedIndex.getMemorySize();` above, so the > > > > purpose of this code would be to take into account the size of these > > > > "reference objects". > > > > > > > > However, since `PostingList::bytes()` also returns underlying > > > > `std::vector` size, this code will probably add these `std::vector` > > > > objects size twice (the first one was in > > > > `InvertedIndex.getMemorySize()`). I should fix that. > > > > `If entries are pointers to objects, the size of the referenced objects > > > > are not included.` > > > This applies to *pointers* to objects, but the `PostingList`s in > > > InvertedList are not pointerd? So it seems to me that > > > `InvertedIndex.getMemorySize()` covers everything in the `InvertedIndex`. > > > One way to verify is probably check the size calculated with loop against > > > `InvertedIndex.getMemorySize()`. > > As discussed offline, pointers and references are an example of objects > > which would be incorrectly measured by `DesneMap::getMemorySize()`, but > > `std::vector` and `std::string` are pointers in a way because they also > > just store a pointer to the underlying storage which is hidden from > > `DenseMap::getMemorySize()`; thus, it would be more precise to use the > > proposed scheme for memory estimation. > I see. I was expecting DenseMap to maintain an arena which could provide > memory usage of all owned data, but it seems to only consider > `sizeof(KV-Pair)`... > > If the posting lists (vectors) are not covered, shouldn't we also also add > the size of tokens (string) here? Yes, I was also adding `TokenToPostingList.first.Data.size()` to the size estimate in the previous revision snapshot, but as @sammccall pointed out, there are multiple string implementations and one of them (IIUC this one is currently pushed in the C++ Standard) utilizes Small Object Optimization (i.e. it behaves like `llvm::SmallVector` rather than `std::vector`). Therefore, it does not perform external allocations before some limit is reached. I don't know whether it is possible to know whether the external allocation happened or not and the summary of our discussion with Sam was that we should just omit this. https://reviews.llvm.org/D52503 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits