ioeric added inline comments.
================ Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:74 + // symbol of the identifier. + if (!FoundFirstSymbol) { + FoundFirstSymbol = true; ---------------- kbobyrev wrote: > ioeric wrote: > > Could this be pulled out of the loop? I think what we want is just > > `LowercaseIdentifier[0]` right? > > > > I'd probably also pulled that into a function, as the function body is > > getting larger. > Same as elsewhere, if we have `__builtin_whatever` the it's not actually the > first symbol of the lowercase identifier. I would argue that I would start by typing "_" if I actually want `__builtin_whatever`. I'm also not sure if this is the case we should optimize for as well; __builtin symbols are already penalized in code completion ranking. ================ Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:115 +// FIXME(kbobyrev): Correctly handle empty trigrams "$$$". std::vector<Token> generateQueryTrigrams(llvm::StringRef Query) { // Apply fuzzy matching text segmentation. ---------------- kbobyrev wrote: > ioeric wrote: > > It seems to me that what we need for short queries is simply: > > ``` > > if (Query.empty()) { > > // return empty token > > } > > if (Query.size() == 1) return {Query + "$$"}; > > if (Query.size() == 2) return {Query + "$"}; > > > > // Longer queries... > > ``` > > ? > That would mean that we expect the query to be "valid", i.e. only consist of > letters and digits. My concern is about what happens if we have `"u_"` or > something similar (`"_u", "_u_", "$u$"`, etc) - in that case we would > actually still have to identify the first valid symbol for the trigram, > process the string (trim it, etc) which looks very similar to what > FuzzyMatching `calculateRoles` does. > > The current approach is rather straightforward and generic, but I can try to > change it if you want. My biggest concern is fighting some corner cases and > ensuring that the query is "right" on the user (index) side, which might turn > out to be more code and ensuring that the "state" is valid throughout the > pipeline. It's not clear what we would want to match with "*_", except for `u_` in `unique_ptr` (maybe). Generally, as short queries tend to match many more symbols, I think we should try to make them more restrictive and optimize for the most common use case. https://reviews.llvm.org/D50517 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits