sammccall added a comment.

I've landed the caching patch so this is good to go.
The merge is *conceptually* trivial but the enclosing function has moved so you 
may need to reapply it by hand, sorry...



================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:801
+  if (IsSystem)
+    return "<" + ShorterInclude + ">";
   // Standard case: just insert the file itself.
----------------
sammccall wrote:
> kadircet wrote:
> > sammccall wrote:
> > > This is a great heuristic, now I'm thinking of all the ways it can go 
> > > wrong...
> > > 
> > > ---
> > > 
> > > It could be slow: it's doing a string comparison for every include path 
> > > entry (not just the -isystems) and that can be a long list for big 
> > > codebases. We're calling this for most symbols we encounter, so in 
> > > principle this looks quadratic in codebase size when indexing a preamble.
> > > 
> > > It's tempting to throw a cache on it but actually... the callsite looks 
> > > like:
> > > ```
> > > foreach header:
> > >   foreach symbol:
> > >     adjust(getIncludeHeader(symbol, header));
> > > ```
> > > 
> > > getIncludeHeader() is entirely dependent on the header almost ignores the 
> > > symbol, so hoisting it out of the loop would be as good as caching it.
> > > 
> > > The wrinkle is the std::move hack, but we can move *that* part alone 
> > > inside the loop instead.
> > > 
> > > ---
> > > 
> > > It could fail to translate properly between paths, being inserted where 
> > > the -isystem wasn't available. Often this means the *library* isn't 
> > > available in this part of the codebase, so really the problem is offering 
> > > the symbol at all (though using verbatim headers may make this harder to 
> > > fix). Dynamic indexes spanning multiple projects are another case, but I 
> > > think a marginal one. I don't think we need to address this.
> > > 
> > > ---
> > > 
> > > It may give inconsistent results between TUs for the same symbol.
> > > When a definition is indexed, it should win, so the problematic cases are:
> > >  - header-only symbols (or no definition indexed) with inconsistent 
> > > -isystem. This is not terribly important, as above I think.
> > >  - definition is missing -isystem, because *within* the library headers 
> > > are not considered "system". This seems likely to be common, and I'm not 
> > > sure how to address it (other than giving verbatim headers priority 
> > > always, which seems dubious). It's not a regression though (just the bug 
> > > doesn't get fixed for these symbols.)
> > >  - definition has -isystem (library impl considers its own headers as 
> > > system) but other files don't. Also not sure how to solve this but it 
> > > seems less likely to be common to me.
> > > 
> > > One design around these problems would be to store *both* a preferred 
> > > spelling and a filename, and to use the spelling if it can be resolved. 
> > > But this is invasive, means doing stat()s, doesn't work well with 
> > > genfiles... I'm not really sure of a clean solution here.
> > > 
> > > Maybe we just move ahead as-is and see if we hit problems...
> > for latency concerns, i was mostly turning a blind eye as this would only 
> > happen once per indexing. but you are right, we can get away from any 
> > possible regressions by moving it to per-header layer.
> > 
> > ---
> > 
> > > It could fail to translate properly between paths, being inserted where 
> > > the -isystem wasn't available. Often this means the *library* isn't 
> > > available in this part of the codebase, so really the problem is offering 
> > > the symbol at all (though using verbatim headers may make this harder to 
> > > fix). Dynamic indexes spanning multiple projects are another case, but I 
> > > think a marginal one. I don't think we need to address this.
> > 
> > as you mentioned this is likely to show up either when there's a 
> > remote-index/static-index built on another machine in use or dynamic index 
> > with multiple projects.
> > - for remote-index/static-index case, i think it is unlikely to show up, as 
> > the symbol should be available during local development too.
> > -for dynamic index spanning multiple indexes though, previously we would 
> > just suggest a wrong symbol and now we'll also perform a wrong header 
> > insertion so it is sort-of worse. but as you mentioned this should probably 
> > be handled on symbol level as well.
> > 
> > my mental model around the fix was rather making use of the 
> > declaration/definition location of the symbol, which is guaranteed to be a 
> > URI, rather than a symbolic representation of the "owning" header. hence i 
> > feel like this shouldn't make things harder when we are fixing that issue.
> > 
> > > It may give inconsistent results between TUs for the same symbol.
> > 
> > I didn't consider this case thoroughly, as I was assuming system search 
> > paths to be same for the whole codebase (leaving multiple-projects case 
> > out).
> > 
> > > header-only symbols (or no definition indexed) with inconsistent 
> > > -isystem. This is not terribly important, as above I think.
> > 
> > well, in theory we just accumulate include headers in this case, which 
> > means we can still recover somehow if it turns out to be important.
> > 
> > > definition is missing -isystem, because *within* the library headers are 
> > > not considered "system". This seems likely to be common, and I'm not sure 
> > > how to address it (other than giving verbatim headers priority always, 
> > > which seems dubious). It's not a regression though (just the bug doesn't 
> > > get fixed for these symbols.)
> > 
> > ugh that's true :/ i am not really sure what's the convention for these in 
> > libraries themselves, but even with clangd if we somehow end up indexing 
> > definition locations for those symbols and user triggers a go-to, clangd 
> > will likely use fallback commands to index those files, which will be 
> > missing `-isystem` like flags. but as you mentioned this just means some 
> > cases are left unfixed.
> > 
> > as for solution, the best i can think of is we keep storing URIs for 
> > include headers, in all cases and never store verbatim. then we figure out 
> > the spelled include at use time, while making use of headersearchinfo and 
> > URI specific "get the spelling" logic (we probably want to have some other 
> > mechanism for the latter).
> > but it means we perform the resolution on every code completion item (which 
> > we already do, when the stored include isn't verbatim and uri spelling 
> > doesn't yield anything), so this shouldn't terribly regress completion 
> > latency.
> > 
> > basically what i am suggesting is propagating headersearchinfo into 
> > `URI::getIncludeSpelling`, which is more invasive then this change and 
> > conceptually doesn't sound so appealing. here's another terrible idea :), 
> > maybe we should have a module hook instead for generating custom include 
> > spellings ? (but we don't want to iterate over all modules on every 
> > completion i suppose, so this will definitely require some extra tricks 
> > like fetching a functor from the module, also there's the question of 
> > multiple modules implementing the hook..)
> > 
> > so yeah this is definitely more invasive, hence i'd rather go with the 
> > solution proposed here until we see some complaints around this specific 
> > workflow, wdyt?
> > 
> > > definition has -isystem (library impl considers its own headers as 
> > > system) but other files don't. Also not sure how to solve this but it 
> > > seems less likely to be common to me.
> > 
> > i think the approach mentioned above should help with this too.
> > this would only happen once per indexing
> 
> I'm not quite sure what this means, am I wrong to think this happens each 
> time we see a symbol in a preamble? In any case, avoiding calling this 
> function in the inner loop is a separate optimization I can send a patch for.
> 
> > my mental model around the fix was rather making use of the 
> > declaration/definition location of the symbol, which is guaranteed to be a 
> > URI
> 
> Of course, you're right. Nevermind this point.
> 
> > I was assuming system search paths to be same for the whole codebase
> 
> I think that assumption is broken for the combination of:
>  - decent size project with meaningful subcomponents whose deps vary, like 
> llvm-project (or even smaller ones)
>  - a liberal use of "system", e.g. i've often seen OpenGL headers used with 
> <>, to pick a random example
> 
> > well, in theory we just accumulate include headers in this case
> 
> ... and I forgot we collect multiple and "vote" on them. That makes this much 
> less bad I think.
> 
> >> definition is missing -isystem, because *within* the library headers are 
> >> not considered "system". 
> > ugh that's true :/ i am not really sure what's the convention for these in 
> > libraries themselves
> 
> Well, this isn't a regression (though inconsistent) and in the large majority 
> of cases we're not actually going to be indexing the definitions. So let's 
> ship it and see what breaks.
> It's tempting to throw a cache on it but actually... the callsite looks like:
> ```
> foreach header:
>  foreach symbol:
>   adjust(getIncludeHeader(symbol, header));
> ```

Haha, actually I was completely misreading the code, sorry for the detour here. 
(Worked out well in the end, mostly due to luck and a bit of stubbornness 
though)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98242/new/

https://reviews.llvm.org/D98242

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to