ioeric added inline comments.

================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:74
+    // symbol of the identifier.
+    if (!FoundFirstSymbol) {
+      FoundFirstSymbol = true;
----------------
kbobyrev wrote:
> ioeric wrote:
> > Could this be pulled out of the loop? I think what we want is just 
> > `LowercaseIdentifier[0]` right?
> > 
> > I'd probably also pulled that into a function, as the function body is 
> > getting larger.
> Same as elsewhere, if we have `__builtin_whatever` the it's not actually the 
> first symbol of the lowercase identifier.
I would argue that I would start by typing "_" if I actually want 
`__builtin_whatever`. I'm also not sure if this is the case we should optimize 
for as well; __builtin symbols  are already penalized in code completion 
ranking.


================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:115
+// FIXME(kbobyrev): Correctly handle empty trigrams "$$$".
 std::vector<Token> generateQueryTrigrams(llvm::StringRef Query) {
   // Apply fuzzy matching text segmentation.
----------------
kbobyrev wrote:
> ioeric wrote:
> > It seems to me that what we need for short queries is simply:
> > ```
> > if (Query.empty()) {
> >    // return empty token
> > }
> > if (Query.size() == 1) return {Query + "$$"};
> > if (Query.size() == 2) return {Query + "$"};
> > 
> > // Longer queries...
> > ```
> > ?
> That would mean that we expect the query to be "valid", i.e. only consist of 
> letters and digits. My concern is about what happens if we have `"u_"` or 
> something similar (`"_u", "_u_", "$u$"`, etc) - in that case we would 
> actually still have to identify the first valid symbol for the trigram, 
> process the string (trim it, etc) which looks very similar to what 
> FuzzyMatching `calculateRoles` does.
> 
> The current approach is rather straightforward and generic, but I can try to 
> change it if you want. My biggest concern is fighting some corner cases and 
> ensuring that the query is "right" on the user (index) side, which might turn 
> out to be more code and ensuring that the "state" is valid throughout the 
> pipeline.
It's not clear what we would want to match with "*_", except for `u_` in 
`unique_ptr` (maybe).  

Generally, as short queries tend to match many more symbols, I think we should 
try to make them more restrictive and optimize for the most common use case. 


https://reviews.llvm.org/D50517



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to