sammccall added a comment. Mostly looks good, a few further simplifications...
================ Comment at: clang-tools-extra/clangd/index/dex/PostingList.cpp:59 + CurrentID = std::lower_bound(CurrentID, std::end(DecompressedChunk), ID); + // Return if the position was found in current chunk. + if (CurrentID != std::end(DecompressedChunk)) ---------------- I find "if the position was found" somewhat misleading - even if the ID is not in the chunk we can his this case. Maybe extract `normalizeCursor()` here, with ``` // If the cursor is at the end of a chunk, place it at the start of the next chunk. void normalizeCursor() { ... } ``` This can be shared with advance(). ================ Comment at: clang-tools-extra/clangd/index/dex/PostingList.cpp:116 + decltype(Chunks)::const_iterator CurrentChunk; + llvm::SmallVector<DocID, Chunk::PayloadSize + 1> DecompressedChunk; + // Iterator over DecompressedChunk. ---------------- Comment the invariants here, e.g. ``` // If CurrentChunk is valid, then DecompressedChunk is CurrentChunk->decompress() // and CurrentID is a valid (non-end) iterator into it. ``` ================ Comment at: clang-tools-extra/clangd/index/dex/PostingList.cpp:128 + // of meaningful bytes divided by Chunk::BitsPerEncodingByte. + size_t Width = (sizeof(DocID) * 8 - llvm::countLeadingZeros(Delta) + + Chunk::BitsPerEncodingByte - 1) / ---------------- This gives the wrong answer for zero. So assert Delta != 0? ================ Comment at: clang-tools-extra/clangd/index/dex/PostingList.cpp:128 + // of meaningful bytes divided by Chunk::BitsPerEncodingByte. + size_t Width = (sizeof(DocID) * 8 - llvm::countLeadingZeros(Delta) + + Chunk::BitsPerEncodingByte - 1) / ---------------- sammccall wrote: > This gives the wrong answer for zero. So assert Delta != 0? nit: `int` or `unsigned`, not `size_t` ================ Comment at: clang-tools-extra/clangd/index/dex/PostingList.cpp:128 + // of meaningful bytes divided by Chunk::BitsPerEncodingByte. + size_t Width = (sizeof(DocID) * 8 - llvm::countLeadingZeros(Delta) + + Chunk::BitsPerEncodingByte - 1) / ---------------- sammccall wrote: > sammccall wrote: > > This gives the wrong answer for zero. So assert Delta != 0? > nit: `int` or `unsigned`, not `size_t` `Width = 1 + findLastSet(Delta) / 7` ================ Comment at: clang-tools-extra/clangd/index/dex/PostingList.cpp:138 + << Chunk::BitsPerEncodingByte * I; + uint8_t Encoding = (Delta & Mask) >> (Chunk::BitsPerEncodingByte * I); + bool HasNextByte = I + 1 != Width; ---------------- The mask doesn't need to vary, just apply it after shifting. But really this loop is much clearer if you modify delta in place. ``` do { Encoding = Delta & 0x7f; Delta >>= 7; Payload.front() = Delta ? Encoding : 0x80 | Encoding; Payload = Payload.drop_front(); } while (Delta != 0); ``` ================ Comment at: clang-tools-extra/clangd/index/dex/PostingList.cpp:171 + llvm::SmallVector<Chunk, 1> Result; + std::array<uint8_t, Chunk::PayloadSize> Payload; + Payload.fill(0); ---------------- Why not just declare a chunk here (or use Result.back() and work in place)? ``` Result.emplace_back(); DocID Last = Result.back().Head = Documents.front(); MutableArrayRef<uint8_t> RemainingPayload = Result.back().Payload; for (DocID Doc : Documents.drop_front()) { // no need to handle I == 0 special case. if (!encodeVByte(Doc - Last, RemainingPayload)) { // didn't fit, flush chunk Result.emplace_back(); Result.back().Head = Doc; RemainingPayload = Result.back().Payload; } Last = Doc; } ``` more values, fewer indices ================ Comment at: clang-tools-extra/clangd/index/dex/PostingList.cpp:173 + Payload.fill(0); + llvm::MutableArrayRef<uint8_t> PayloadRef(Payload); + size_t HeadIndex = 0; ---------------- `PayloadRef` doesn't really describe the function of this variable. Suggest `RemainingPayload` or `EmptyPayload` or so ================ Comment at: clang-tools-extra/clangd/index/dex/PostingList.cpp:192 +/// Reads variable length DocID from the buffer and updates the buffer size. If +/// the stream is terminated, return None. +llvm::Optional<DocID> readVByte(llvm::ArrayRef<uint8_t> &Bytes) { ---------------- nit: if the stream is terminated, **consumes all bytes and** returns None. ================ Comment at: clang-tools-extra/clangd/index/dex/PostingList.h:41 +/// decompressed upon request. +struct Chunk { + // Keep sizeof(Chunk) == 32. ---------------- sammccall wrote: > With the current implementation, this doesn't need to be in the header. > (the layout of vector<chunk> doesn't depend on chunk, you should just need to > out-line the default destructor) > > (using SmallVector<Chunk, 1> or maybe 2 *might* be a win. I'd expect not > though. I'd either stick with std::vector, or measure) (meta-nit: please don't mark comments as done if they're not done - rather explain why you didn't do them!) ================ Comment at: clang-tools-extra/clangd/index/dex/PostingList.h:41 +/// decompressed upon request. +struct Chunk { + // Keep sizeof(Chunk) == 32. ---------------- sammccall wrote: > sammccall wrote: > > With the current implementation, this doesn't need to be in the header. > > (the layout of vector<chunk> doesn't depend on chunk, you should just need > > to out-line the default destructor) > > > > (using SmallVector<Chunk, 1> or maybe 2 *might* be a win. I'd expect not > > though. I'd either stick with std::vector, or measure) > (meta-nit: please don't mark comments as done if they're not done - rather > explain why you didn't do them!) > (using SmallVector<Chunk, 1> or maybe 2 *might* be a win. I'd expect not > though. I'd either stick with std::vector, or measure) You changed this to SmallVector - what were the measurements? (SmallVector is going to be bigger than Vector whenever you overrun, so it's worth checking) ================ Comment at: clang-tools-extra/clangd/index/dex/PostingList.h:39 +/// Chunk is a fixed-width piece of PostingList which contains the first DocID +/// in uncompressed format (Head) and delta-encoded Payload. It can be ---------------- mark as an implementation detail so readers aren't confused. ================ Comment at: clang-tools-extra/clangd/index/dex/PostingList.h:46 + /// Number of DocID bits in each encoding byte. + static constexpr size_t BitsPerEncodingByte = 7; + ---------------- this is an implementation detail, move to cpp file (or just inline) ================ Comment at: clang-tools-extra/clangd/index/dex/PostingList.h:50 + + /// The first element of + DocID Head; ---------------- ?? https://reviews.llvm.org/D52300 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits