sammccall added a comment. I've landed the caching patch so this is good to go. The merge is *conceptually* trivial but the enclosing function has moved so you may need to reapply it by hand, sorry...
================ Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:801 + if (IsSystem) + return "<" + ShorterInclude + ">"; // Standard case: just insert the file itself. ---------------- sammccall wrote: > kadircet wrote: > > sammccall wrote: > > > This is a great heuristic, now I'm thinking of all the ways it can go > > > wrong... > > > > > > --- > > > > > > It could be slow: it's doing a string comparison for every include path > > > entry (not just the -isystems) and that can be a long list for big > > > codebases. We're calling this for most symbols we encounter, so in > > > principle this looks quadratic in codebase size when indexing a preamble. > > > > > > It's tempting to throw a cache on it but actually... the callsite looks > > > like: > > > ``` > > > foreach header: > > > foreach symbol: > > > adjust(getIncludeHeader(symbol, header)); > > > ``` > > > > > > getIncludeHeader() is entirely dependent on the header almost ignores the > > > symbol, so hoisting it out of the loop would be as good as caching it. > > > > > > The wrinkle is the std::move hack, but we can move *that* part alone > > > inside the loop instead. > > > > > > --- > > > > > > It could fail to translate properly between paths, being inserted where > > > the -isystem wasn't available. Often this means the *library* isn't > > > available in this part of the codebase, so really the problem is offering > > > the symbol at all (though using verbatim headers may make this harder to > > > fix). Dynamic indexes spanning multiple projects are another case, but I > > > think a marginal one. I don't think we need to address this. > > > > > > --- > > > > > > It may give inconsistent results between TUs for the same symbol. > > > When a definition is indexed, it should win, so the problematic cases are: > > > - header-only symbols (or no definition indexed) with inconsistent > > > -isystem. This is not terribly important, as above I think. > > > - definition is missing -isystem, because *within* the library headers > > > are not considered "system". This seems likely to be common, and I'm not > > > sure how to address it (other than giving verbatim headers priority > > > always, which seems dubious). It's not a regression though (just the bug > > > doesn't get fixed for these symbols.) > > > - definition has -isystem (library impl considers its own headers as > > > system) but other files don't. Also not sure how to solve this but it > > > seems less likely to be common to me. > > > > > > One design around these problems would be to store *both* a preferred > > > spelling and a filename, and to use the spelling if it can be resolved. > > > But this is invasive, means doing stat()s, doesn't work well with > > > genfiles... I'm not really sure of a clean solution here. > > > > > > Maybe we just move ahead as-is and see if we hit problems... > > for latency concerns, i was mostly turning a blind eye as this would only > > happen once per indexing. but you are right, we can get away from any > > possible regressions by moving it to per-header layer. > > > > --- > > > > > It could fail to translate properly between paths, being inserted where > > > the -isystem wasn't available. Often this means the *library* isn't > > > available in this part of the codebase, so really the problem is offering > > > the symbol at all (though using verbatim headers may make this harder to > > > fix). Dynamic indexes spanning multiple projects are another case, but I > > > think a marginal one. I don't think we need to address this. > > > > as you mentioned this is likely to show up either when there's a > > remote-index/static-index built on another machine in use or dynamic index > > with multiple projects. > > - for remote-index/static-index case, i think it is unlikely to show up, as > > the symbol should be available during local development too. > > -for dynamic index spanning multiple indexes though, previously we would > > just suggest a wrong symbol and now we'll also perform a wrong header > > insertion so it is sort-of worse. but as you mentioned this should probably > > be handled on symbol level as well. > > > > my mental model around the fix was rather making use of the > > declaration/definition location of the symbol, which is guaranteed to be a > > URI, rather than a symbolic representation of the "owning" header. hence i > > feel like this shouldn't make things harder when we are fixing that issue. > > > > > It may give inconsistent results between TUs for the same symbol. > > > > I didn't consider this case thoroughly, as I was assuming system search > > paths to be same for the whole codebase (leaving multiple-projects case > > out). > > > > > header-only symbols (or no definition indexed) with inconsistent > > > -isystem. This is not terribly important, as above I think. > > > > well, in theory we just accumulate include headers in this case, which > > means we can still recover somehow if it turns out to be important. > > > > > definition is missing -isystem, because *within* the library headers are > > > not considered "system". This seems likely to be common, and I'm not sure > > > how to address it (other than giving verbatim headers priority always, > > > which seems dubious). It's not a regression though (just the bug doesn't > > > get fixed for these symbols.) > > > > ugh that's true :/ i am not really sure what's the convention for these in > > libraries themselves, but even with clangd if we somehow end up indexing > > definition locations for those symbols and user triggers a go-to, clangd > > will likely use fallback commands to index those files, which will be > > missing `-isystem` like flags. but as you mentioned this just means some > > cases are left unfixed. > > > > as for solution, the best i can think of is we keep storing URIs for > > include headers, in all cases and never store verbatim. then we figure out > > the spelled include at use time, while making use of headersearchinfo and > > URI specific "get the spelling" logic (we probably want to have some other > > mechanism for the latter). > > but it means we perform the resolution on every code completion item (which > > we already do, when the stored include isn't verbatim and uri spelling > > doesn't yield anything), so this shouldn't terribly regress completion > > latency. > > > > basically what i am suggesting is propagating headersearchinfo into > > `URI::getIncludeSpelling`, which is more invasive then this change and > > conceptually doesn't sound so appealing. here's another terrible idea :), > > maybe we should have a module hook instead for generating custom include > > spellings ? (but we don't want to iterate over all modules on every > > completion i suppose, so this will definitely require some extra tricks > > like fetching a functor from the module, also there's the question of > > multiple modules implementing the hook..) > > > > so yeah this is definitely more invasive, hence i'd rather go with the > > solution proposed here until we see some complaints around this specific > > workflow, wdyt? > > > > > definition has -isystem (library impl considers its own headers as > > > system) but other files don't. Also not sure how to solve this but it > > > seems less likely to be common to me. > > > > i think the approach mentioned above should help with this too. > > this would only happen once per indexing > > I'm not quite sure what this means, am I wrong to think this happens each > time we see a symbol in a preamble? In any case, avoiding calling this > function in the inner loop is a separate optimization I can send a patch for. > > > my mental model around the fix was rather making use of the > > declaration/definition location of the symbol, which is guaranteed to be a > > URI > > Of course, you're right. Nevermind this point. > > > I was assuming system search paths to be same for the whole codebase > > I think that assumption is broken for the combination of: > - decent size project with meaningful subcomponents whose deps vary, like > llvm-project (or even smaller ones) > - a liberal use of "system", e.g. i've often seen OpenGL headers used with > <>, to pick a random example > > > well, in theory we just accumulate include headers in this case > > ... and I forgot we collect multiple and "vote" on them. That makes this much > less bad I think. > > >> definition is missing -isystem, because *within* the library headers are > >> not considered "system". > > ugh that's true :/ i am not really sure what's the convention for these in > > libraries themselves > > Well, this isn't a regression (though inconsistent) and in the large majority > of cases we're not actually going to be indexing the definitions. So let's > ship it and see what breaks. > It's tempting to throw a cache on it but actually... the callsite looks like: > ``` > foreach header: > foreach symbol: > adjust(getIncludeHeader(symbol, header)); > ``` Haha, actually I was completely misreading the code, sorry for the detour here. (Worked out well in the end, mostly due to luck and a bit of stubbornness though) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D98242/new/ https://reviews.llvm.org/D98242 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits