sammccall accepted this revision. sammccall added a comment. This revision is now accepted and ready to land.
I think this is going to be more good than bad, and the alternatives are terrifically complicated. ================ Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:801 + if (IsSystem) + return "<" + ShorterInclude + ">"; // Standard case: just insert the file itself. ---------------- kadircet wrote: > sammccall wrote: > > This is a great heuristic, now I'm thinking of all the ways it can go > > wrong... > > > > --- > > > > It could be slow: it's doing a string comparison for every include path > > entry (not just the -isystems) and that can be a long list for big > > codebases. We're calling this for most symbols we encounter, so in > > principle this looks quadratic in codebase size when indexing a preamble. > > > > It's tempting to throw a cache on it but actually... the callsite looks > > like: > > ``` > > foreach header: > > foreach symbol: > > adjust(getIncludeHeader(symbol, header)); > > ``` > > > > getIncludeHeader() is entirely dependent on the header almost ignores the > > symbol, so hoisting it out of the loop would be as good as caching it. > > > > The wrinkle is the std::move hack, but we can move *that* part alone inside > > the loop instead. > > > > --- > > > > It could fail to translate properly between paths, being inserted where the > > -isystem wasn't available. Often this means the *library* isn't available > > in this part of the codebase, so really the problem is offering the symbol > > at all (though using verbatim headers may make this harder to fix). Dynamic > > indexes spanning multiple projects are another case, but I think a marginal > > one. I don't think we need to address this. > > > > --- > > > > It may give inconsistent results between TUs for the same symbol. > > When a definition is indexed, it should win, so the problematic cases are: > > - header-only symbols (or no definition indexed) with inconsistent > > -isystem. This is not terribly important, as above I think. > > - definition is missing -isystem, because *within* the library headers are > > not considered "system". This seems likely to be common, and I'm not sure > > how to address it (other than giving verbatim headers priority always, > > which seems dubious). It's not a regression though (just the bug doesn't > > get fixed for these symbols.) > > - definition has -isystem (library impl considers its own headers as > > system) but other files don't. Also not sure how to solve this but it seems > > less likely to be common to me. > > > > One design around these problems would be to store *both* a preferred > > spelling and a filename, and to use the spelling if it can be resolved. But > > this is invasive, means doing stat()s, doesn't work well with genfiles... > > I'm not really sure of a clean solution here. > > > > Maybe we just move ahead as-is and see if we hit problems... > for latency concerns, i was mostly turning a blind eye as this would only > happen once per indexing. but you are right, we can get away from any > possible regressions by moving it to per-header layer. > > --- > > > It could fail to translate properly between paths, being inserted where the > > -isystem wasn't available. Often this means the *library* isn't available > > in this part of the codebase, so really the problem is offering the symbol > > at all (though using verbatim headers may make this harder to fix). Dynamic > > indexes spanning multiple projects are another case, but I think a marginal > > one. I don't think we need to address this. > > as you mentioned this is likely to show up either when there's a > remote-index/static-index built on another machine in use or dynamic index > with multiple projects. > - for remote-index/static-index case, i think it is unlikely to show up, as > the symbol should be available during local development too. > -for dynamic index spanning multiple indexes though, previously we would just > suggest a wrong symbol and now we'll also perform a wrong header insertion so > it is sort-of worse. but as you mentioned this should probably be handled on > symbol level as well. > > my mental model around the fix was rather making use of the > declaration/definition location of the symbol, which is guaranteed to be a > URI, rather than a symbolic representation of the "owning" header. hence i > feel like this shouldn't make things harder when we are fixing that issue. > > > It may give inconsistent results between TUs for the same symbol. > > I didn't consider this case thoroughly, as I was assuming system search paths > to be same for the whole codebase (leaving multiple-projects case out). > > > header-only symbols (or no definition indexed) with inconsistent -isystem. > > This is not terribly important, as above I think. > > well, in theory we just accumulate include headers in this case, which means > we can still recover somehow if it turns out to be important. > > > definition is missing -isystem, because *within* the library headers are > > not considered "system". This seems likely to be common, and I'm not sure > > how to address it (other than giving verbatim headers priority always, > > which seems dubious). It's not a regression though (just the bug doesn't > > get fixed for these symbols.) > > ugh that's true :/ i am not really sure what's the convention for these in > libraries themselves, but even with clangd if we somehow end up indexing > definition locations for those symbols and user triggers a go-to, clangd will > likely use fallback commands to index those files, which will be missing > `-isystem` like flags. but as you mentioned this just means some cases are > left unfixed. > > as for solution, the best i can think of is we keep storing URIs for include > headers, in all cases and never store verbatim. then we figure out the > spelled include at use time, while making use of headersearchinfo and URI > specific "get the spelling" logic (we probably want to have some other > mechanism for the latter). > but it means we perform the resolution on every code completion item (which > we already do, when the stored include isn't verbatim and uri spelling > doesn't yield anything), so this shouldn't terribly regress completion > latency. > > basically what i am suggesting is propagating headersearchinfo into > `URI::getIncludeSpelling`, which is more invasive then this change and > conceptually doesn't sound so appealing. here's another terrible idea :), > maybe we should have a module hook instead for generating custom include > spellings ? (but we don't want to iterate over all modules on every > completion i suppose, so this will definitely require some extra tricks like > fetching a functor from the module, also there's the question of multiple > modules implementing the hook..) > > so yeah this is definitely more invasive, hence i'd rather go with the > solution proposed here until we see some complaints around this specific > workflow, wdyt? > > > definition has -isystem (library impl considers its own headers as system) > > but other files don't. Also not sure how to solve this but it seems less > > likely to be common to me. > > i think the approach mentioned above should help with this too. > this would only happen once per indexing I'm not quite sure what this means, am I wrong to think this happens each time we see a symbol in a preamble? In any case, avoiding calling this function in the inner loop is a separate optimization I can send a patch for. > my mental model around the fix was rather making use of the > declaration/definition location of the symbol, which is guaranteed to be a URI Of course, you're right. Nevermind this point. > I was assuming system search paths to be same for the whole codebase I think that assumption is broken for the combination of: - decent size project with meaningful subcomponents whose deps vary, like llvm-project (or even smaller ones) - a liberal use of "system", e.g. i've often seen OpenGL headers used with <>, to pick a random example > well, in theory we just accumulate include headers in this case ... and I forgot we collect multiple and "vote" on them. That makes this much less bad I think. >> definition is missing -isystem, because *within* the library headers are not >> considered "system". > ugh that's true :/ i am not really sure what's the convention for these in > libraries themselves Well, this isn't a regression (though inconsistent) and in the large majority of cases we're not actually going to be indexing the definitions. So let's ship it and see what breaks. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D98242/new/ https://reviews.llvm.org/D98242 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits