ioeric added inline comments.
================ Comment at: clang-tools-extra/clangd/index/noctem/SearchToken.cpp:38 + +// FIXME(kbobyrev): Deal with short symbol symbol names. A viable approach would +// be generating unigrams and bigrams here, too. This would prevent symbol index ---------------- Please also add what the current behavior is for short names. Do we just ignore them? ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchToken.cpp:44 +generateTrigrams(const std::vector<llvm::SmallString<10>> &Segments) { + llvm::DenseSet<SearchToken> UniqueTrigrams; + std::vector<SearchToken> Trigrams; ---------------- Could we replace `UniqaueTrigrams`+`Trigrams` with a dense map from hash to token? ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchToken.cpp:47 + + // Extract trigrams consisting of first characters of tokens sorted by of + // token positions. Trigram generator is allowed to skip 1 word between each ---------------- nit: a redundant "of" EOL. ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchToken.cpp:57 + // this case. + for (auto FirstSegment = Segments.begin(); FirstSegment != Segments.end(); + ++FirstSegment) { ---------------- nit: Maybe S1, S2, S3 instead of FirstSegment, ...? ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchToken.cpp:68 + SearchToken Trigram( + (*FirstSegment + *SecondSegment + *ThirdSegment).str(), + SearchToken::Kind::Trigram); ---------------- This seems wrong... wouldn't this give you a concatenation of three segments? ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchToken.cpp:68 + SearchToken Trigram( + (*FirstSegment + *SecondSegment + *ThirdSegment).str(), + SearchToken::Kind::Trigram); ---------------- ioeric wrote: > This seems wrong... wouldn't this give you a concatenation of three segments? For trigrams, it might make sense to put 3 chars into a `SmallVector<3>` (can be reused) and std::move it into the constructor. Might be cheaper than creating a std::string ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchToken.cpp:87 + + for (size_t Position = 0; Position + 2 < Segment.size(); ++Position) + Trigrams.push_back( ---------------- nit: `Position < Segment.size() - 2` seems more commonly used. ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchToken.cpp:92 + + for (auto FirstSegment = Segments.begin(); FirstSegment != Segments.end(); + ++FirstSegment) { ---------------- Comment for this loop? ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchToken.cpp:135 + // Skip underscores at the beginning, e.g. "__builtin_popcount". + while (SymbolName[SegmentStart] == '_') + ++SegmentStart; ---------------- use `trim`? ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchToken.cpp:138 + + for (size_t Index = SegmentStart; Index + 1 < SymbolName.size(); ++Index) { + const char CurrentSymbol = SymbolName[Index]; ---------------- Maybe first split name on `_` and them run further upper-lower segmentation on the split segments? ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchToken.h:58 + // Returns precomputed hash. + size_t operator()(const SearchToken &T) const { return Hash; } + ---------------- Any reason this has to be `operator()` instead of a `hash` method? `operator()` for hash value is not trivial on the call site. ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchToken.h:69 +private: + friend llvm::hash_code hash_value(const SearchToken &Token) { + return Token.Hash; ---------------- Who's the user of this friend function? Could it just call `Token.hash()`? ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchToken.h:80 + /// * Trigram: 3 bytes containing trigram characters + /// * Scope: full scope name, e.g. "foo::bar::baz" + /// * Path: full or relative path to the directory ---------------- nit: scope in clangd world is "foo::bar::baz::", but the global scope is "". ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchToken.h:159 +/// \brief Combines segmentIdentifier() and generateTrigrams(Segments). +std::vector<SearchToken> generateTrigrams(llvm::StringRef SymbolName); + ---------------- This seems to be the same as just `generateTrigrams(segmentIdentifier(Name))`, so I'd drop it. https://reviews.llvm.org/D49417 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits