sammccall accepted this revision.
sammccall added a comment.
This revision is now accepted and ready to land.

I think this is going to be more good than bad, and the alternatives are 
terrifically complicated.



================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:801
+  if (IsSystem)
+    return "<" + ShorterInclude + ">";
   // Standard case: just insert the file itself.
----------------
kadircet wrote:
> sammccall wrote:
> > This is a great heuristic, now I'm thinking of all the ways it can go 
> > wrong...
> > 
> > ---
> > 
> > It could be slow: it's doing a string comparison for every include path 
> > entry (not just the -isystems) and that can be a long list for big 
> > codebases. We're calling this for most symbols we encounter, so in 
> > principle this looks quadratic in codebase size when indexing a preamble.
> > 
> > It's tempting to throw a cache on it but actually... the callsite looks 
> > like:
> > ```
> > foreach header:
> >   foreach symbol:
> >     adjust(getIncludeHeader(symbol, header));
> > ```
> > 
> > getIncludeHeader() is entirely dependent on the header almost ignores the 
> > symbol, so hoisting it out of the loop would be as good as caching it.
> > 
> > The wrinkle is the std::move hack, but we can move *that* part alone inside 
> > the loop instead.
> > 
> > ---
> > 
> > It could fail to translate properly between paths, being inserted where the 
> > -isystem wasn't available. Often this means the *library* isn't available 
> > in this part of the codebase, so really the problem is offering the symbol 
> > at all (though using verbatim headers may make this harder to fix). Dynamic 
> > indexes spanning multiple projects are another case, but I think a marginal 
> > one. I don't think we need to address this.
> > 
> > ---
> > 
> > It may give inconsistent results between TUs for the same symbol.
> > When a definition is indexed, it should win, so the problematic cases are:
> >  - header-only symbols (or no definition indexed) with inconsistent 
> > -isystem. This is not terribly important, as above I think.
> >  - definition is missing -isystem, because *within* the library headers are 
> > not considered "system". This seems likely to be common, and I'm not sure 
> > how to address it (other than giving verbatim headers priority always, 
> > which seems dubious). It's not a regression though (just the bug doesn't 
> > get fixed for these symbols.)
> >  - definition has -isystem (library impl considers its own headers as 
> > system) but other files don't. Also not sure how to solve this but it seems 
> > less likely to be common to me.
> > 
> > One design around these problems would be to store *both* a preferred 
> > spelling and a filename, and to use the spelling if it can be resolved. But 
> > this is invasive, means doing stat()s, doesn't work well with genfiles... 
> > I'm not really sure of a clean solution here.
> > 
> > Maybe we just move ahead as-is and see if we hit problems...
> for latency concerns, i was mostly turning a blind eye as this would only 
> happen once per indexing. but you are right, we can get away from any 
> possible regressions by moving it to per-header layer.
> 
> ---
> 
> > It could fail to translate properly between paths, being inserted where the 
> > -isystem wasn't available. Often this means the *library* isn't available 
> > in this part of the codebase, so really the problem is offering the symbol 
> > at all (though using verbatim headers may make this harder to fix). Dynamic 
> > indexes spanning multiple projects are another case, but I think a marginal 
> > one. I don't think we need to address this.
> 
> as you mentioned this is likely to show up either when there's a 
> remote-index/static-index built on another machine in use or dynamic index 
> with multiple projects.
> - for remote-index/static-index case, i think it is unlikely to show up, as 
> the symbol should be available during local development too.
> -for dynamic index spanning multiple indexes though, previously we would just 
> suggest a wrong symbol and now we'll also perform a wrong header insertion so 
> it is sort-of worse. but as you mentioned this should probably be handled on 
> symbol level as well.
> 
> my mental model around the fix was rather making use of the 
> declaration/definition location of the symbol, which is guaranteed to be a 
> URI, rather than a symbolic representation of the "owning" header. hence i 
> feel like this shouldn't make things harder when we are fixing that issue.
> 
> > It may give inconsistent results between TUs for the same symbol.
> 
> I didn't consider this case thoroughly, as I was assuming system search paths 
> to be same for the whole codebase (leaving multiple-projects case out).
> 
> > header-only symbols (or no definition indexed) with inconsistent -isystem. 
> > This is not terribly important, as above I think.
> 
> well, in theory we just accumulate include headers in this case, which means 
> we can still recover somehow if it turns out to be important.
> 
> > definition is missing -isystem, because *within* the library headers are 
> > not considered "system". This seems likely to be common, and I'm not sure 
> > how to address it (other than giving verbatim headers priority always, 
> > which seems dubious). It's not a regression though (just the bug doesn't 
> > get fixed for these symbols.)
> 
> ugh that's true :/ i am not really sure what's the convention for these in 
> libraries themselves, but even with clangd if we somehow end up indexing 
> definition locations for those symbols and user triggers a go-to, clangd will 
> likely use fallback commands to index those files, which will be missing 
> `-isystem` like flags. but as you mentioned this just means some cases are 
> left unfixed.
> 
> as for solution, the best i can think of is we keep storing URIs for include 
> headers, in all cases and never store verbatim. then we figure out the 
> spelled include at use time, while making use of headersearchinfo and URI 
> specific "get the spelling" logic (we probably want to have some other 
> mechanism for the latter).
> but it means we perform the resolution on every code completion item (which 
> we already do, when the stored include isn't verbatim and uri spelling 
> doesn't yield anything), so this shouldn't terribly regress completion 
> latency.
> 
> basically what i am suggesting is propagating headersearchinfo into 
> `URI::getIncludeSpelling`, which is more invasive then this change and 
> conceptually doesn't sound so appealing. here's another terrible idea :), 
> maybe we should have a module hook instead for generating custom include 
> spellings ? (but we don't want to iterate over all modules on every 
> completion i suppose, so this will definitely require some extra tricks like 
> fetching a functor from the module, also there's the question of multiple 
> modules implementing the hook..)
> 
> so yeah this is definitely more invasive, hence i'd rather go with the 
> solution proposed here until we see some complaints around this specific 
> workflow, wdyt?
> 
> > definition has -isystem (library impl considers its own headers as system) 
> > but other files don't. Also not sure how to solve this but it seems less 
> > likely to be common to me.
> 
> i think the approach mentioned above should help with this too.
> this would only happen once per indexing

I'm not quite sure what this means, am I wrong to think this happens each time 
we see a symbol in a preamble? In any case, avoiding calling this function in 
the inner loop is a separate optimization I can send a patch for.

> my mental model around the fix was rather making use of the 
> declaration/definition location of the symbol, which is guaranteed to be a URI

Of course, you're right. Nevermind this point.

> I was assuming system search paths to be same for the whole codebase

I think that assumption is broken for the combination of:
 - decent size project with meaningful subcomponents whose deps vary, like 
llvm-project (or even smaller ones)
 - a liberal use of "system", e.g. i've often seen OpenGL headers used with <>, 
to pick a random example

> well, in theory we just accumulate include headers in this case

... and I forgot we collect multiple and "vote" on them. That makes this much 
less bad I think.

>> definition is missing -isystem, because *within* the library headers are not 
>> considered "system". 
> ugh that's true :/ i am not really sure what's the convention for these in 
> libraries themselves

Well, this isn't a regression (though inconsistent) and in the large majority 
of cases we're not actually going to be indexing the definitions. So let's ship 
it and see what breaks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98242/new/

https://reviews.llvm.org/D98242

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to