ioeric added a comment. Some high-level comments before jumping into details.
================ Comment at: clang-tools-extra/clangd/index/noctem/SearchAtom.cpp:23 + +// FIXME(kbobyrev): Deal with short symbol symbol names. +std::vector<SearchAtom> generateSearchAtoms(StringRef SymbolName) { ---------------- maybe also add how short symbol names are handled in the current algorithm and what the potential solution would be. ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchAtom.cpp:28 + // Apply lowercase text normalization. + for (auto &Token : Tokens) + std::for_each(Token.begin(), Token.end(), ::tolower); ---------------- Any reason not to do this in segmentation? ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchAtom.h:27 + +/// \brief Hashable SearchAtom, which represents a search token primitive. +/// ---------------- Any reason to call this `Atom` instead of `Token`? `Token` seems to be a more commonly used name for this (e.g. https://yomguithereal.github.io/mnemonist/inverted-index#tokens). ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchAtom.h:45 +public: + enum class Namespace : short { + Trigram, ---------------- `Namespace` can be easily confused with namespaces in clang world. Maybe `Kind` or `Type`? ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchAtom.h:51 + + SearchAtom() = default; + SearchAtom(llvm::StringRef Data, Namespace Type = Namespace::Trigram) ---------------- What is an empty atom? Why do we need it? ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchAtom.h:52 + SearchAtom() = default; + SearchAtom(llvm::StringRef Data, Namespace Type = Namespace::Trigram) + : Data(Data), Hash(std::hash<std::string>{}(Data)), Type(Type) {} ---------------- Why default `Type` to `Trigram`? ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchAtom.h:52 + SearchAtom() = default; + SearchAtom(llvm::StringRef Data, Namespace Type = Namespace::Trigram) + : Data(Data), Hash(std::hash<std::string>{}(Data)), Type(Type) {} ---------------- ioeric wrote: > Why default `Type` to `Trigram`? Please add documentation. What is the semantic of `Data`? ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchAtom.h:53 + SearchAtom(llvm::StringRef Data, Namespace Type = Namespace::Trigram) + : Data(Data), Hash(std::hash<std::string>{}(Data)), Type(Type) {} + ---------------- Should we also incorporate `Type` into hash? ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchAtom.h:53 + SearchAtom(llvm::StringRef Data, Namespace Type = Namespace::Trigram) + : Data(Data), Hash(std::hash<std::string>{}(Data)), Type(Type) {} + ---------------- ioeric wrote: > Should we also incorporate `Type` into hash? I'm wondering if we should use different hashes for different token types. For example, a trigram token "xyz" can be encoded in a 4 -byte int with `('x'<<16) & ('y'<<8) & 'z'`, which seems cheaper than `std::hash`. ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchAtom.h:62 + + const llvm::StringRef getData() const { return Data; } + ---------------- nit: s/getData/data/ s/getType/type/ ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchAtom.h:76 + +/// \brief Splits unqualified symbol name into tokens for trigram generation. +/// ---------------- nit: I'd avoid calling these `tokens`, as it they can be confused with tokens for the invert index. How about `segments`? Similarly, `tokenize` should probably be something like `segment` or `segmentIdentifier`. ================ Comment at: clang-tools-extra/clangd/index/noctem/SearchAtom.h:141 +/// trigram belongs to more than one class it is only inserted once. +std::vector<SearchAtom> generateSearchAtoms(llvm::StringRef SymbolName); + ---------------- `generateSearchAtoms` is too generalized a name for this. Maybe just `trigram()`? I also think this could take a list of segments from `tokenize`. https://reviews.llvm.org/D49417 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits