sammccall added a comment. Thanks for finding and working on this hotspot. (I'm curious *why* `isBeforeInTranslationUnit` is slow if you have any insight - it has sad worst-case but I thought we'd hit the one-element cache often enough in practice).
I don't see a better way to optimize this, main question is where to do it. In favor of TokenBuffer initializer (this patch): - it produces the simplest API - no code duplicated between callers - when a TokenBuffer gets reused and it's awkward for its users to share other state, the same cache can be reused In favor of in the caller (e.g. SelectionTree) - users that never call expandedTokens(SourceRange) pay nothing - users that call expandedTokens(SourceRange) once pay `log(N) * isBeforeInTranslationUnit` instead of `N`, which may come out ahead for large files. It's a bit unfortunate that TokenBuffer is enough of a grab-bag of features that we can't really reason about which structures to build :-( Can we get the best of both worlds? Ideally we'd build this cache only when it's profitable. But building on-demand messes with const (and it's unclear whether to do it on the first call). What about making this optional via a non-const `TokenBuffer::indexExpandedTokens(); // Optimizes future calls to expandedTokens(SourceRange). Idempotent.`? It's a slightly weird API, but not actually hard to use (I imagine clangd would always call it when building ParsedAST). ================ Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:372 + // useful while finding expanded tokens in a 'token range'. + llvm::DenseMap<SourceLocation, unsigned> ExpandedTokIndex; llvm::DenseMap<FileID, MarkedFile> Files; ---------------- kadircet wrote: > usaxena95 wrote: > > kadircet wrote: > > > this definitely makes sense, but most of the users of `TokenBuffers` are > > > currently invoking `expandedTokens(Range)` only once, so i am not sure if > > > this is a huge benefit on their-side, whereas they'll likely end up > > > seeing some short-lived memory consumption (and an extra traversal of all > > > the expanded tokens). > > > > > > Any reasons for not moving this logic to the application? > > > `expandedTokens()` endpoint will return all the tokens, no questions > > > asked. Then the application can build whatever acceleration structure it > > > wants. If you think there are in-tree use cases that can benefit from > > > this caching, maybe even provide that as a separate helper? > > > this definitely makes sense, but most of the users of TokenBuffers are > > > currently invoking expandedTokens(Range) only once > > I didn't get that users are invoking it only once. This is not caching the > > results of `expandedTokens(Range)`. We are precomputing these to completely > > avoid a binary search which uses `isBeforeInTranslationUnit` (considerably > > slower). Doing an extra traversal of expandedTokens is not noticeable as > > compared to `isBeforeInTranslationUnit` latency. Users like SelectionTree > > (GoToDefinition, Hover,..) can directly benefit from this even though they > > query this only once per some range. > > > > Umm. If you meant use cases like: build token buffer -> use > > expandedTokens(Range) -> destroy TokenBuffer. > > I don't immediately see such use cases. But I think it is still better than > > using isBeforeInTranslationUnit. > > My mental model is that TokenBuffer is a long lived object (example in > > ParsedAST in clangd). And using it only once and then discarding it is > > expensive in itself. > > > > > Any reasons for not moving this logic to the application? > > > If you think there are in-tree use cases that can benefit from this > > > caching, maybe even provide that as a separate helper? > > This can basically benefit all users of `expandedToken(Range)`. Since > > expandedToken(Range) is provided by TokenBuffer, the optimization must also > > remain in the TokenBuffer as compared to the application. > > Although users which only use this for non-token ranges will pay additional > > cost. We can potentially make this precomputation optional for such cases. > > > > > > > > I didn't get that users are invoking it only once. > > Sorry that was confusing, I was trying to say that on use cases like > Hover/Go-To etc. clangd builds selection tree only once and you are right, > that might actually involve multiple calls to `expandedTokens(Range)` but > that shouldn't be the bottleneck of these features. > > > > This is not caching the results of `expandedTokens(Range)`. We are > > precomputing these to completely avoid a binary search > > right, i was saying that all that precomputation is also probably unneeded, > as normally features are only interested in a single token, which should nest > properly in AST under normal circumstances and doesn't require calling > expandedTokens for all possible ranges. > > > Doing an extra traversal of expandedTokens is not noticeable as compared to > > `isBeforeInTranslationUnit` latency. Users like SelectionTree > > (GoToDefinition, Hover,..) can directly benefit from this even though they > > query this only once per some range. > > I was suggesting to move this into application because that felt > counter-intuitive, as the benchmark you provided is actually running > Hover(and other feautres) on every (spelled) token of the file. If you say > that it is beneficial even for the normal hover case, sure let's go (but i'd > be happier if you proved me wrong first 😅 ) > > > Umm. If you meant use cases like: build token buffer -> use > > expandedTokens(Range) -> destroy TokenBuffer. > > I don't immediately see such use cases. But I think it is still better than > > using isBeforeInTranslationUnit. > > My mental model is that TokenBuffer is a long lived object (example in > > ParsedAST in clangd). And using it only once and then discarding it is > > expensive in itself. > > That's what clangd does though (not sure if Tokenbuffers have any other users > ATM). It literally builds a new ParsedAST after every edit(It also has a > cache to skip some work if contents didn't change between requests). But > usually all of those requests are user-initiated, so an extra latency of a > couple ms on every request isn't really noticeable. > Getting rid of that would still be great, but if it means slowing down the > initial request for quite some time (due to extra traversal of all the > expanded tokens, which might be a huge deal especially on files including > some tablegen'd stuff, as they are usually part of the main file and not > preamble). > > > This can basically benefit all users of `expandedToken(Range)`. > > As stated above, i am just worried that this might hurt the use cases in > which application just builds a tokenbuffer and calls this on a few ranges. > (maybe clangd is the odd one out, but i don't see any other users. there's > `clang::syntax::computeReplacements` but it is only used in a test) > > > Since expandedToken(Range) is provided by TokenBuffer, the optimization > > must also remain in the TokenBuffer as compared to the application. > > I don't follow the reasoning here exactly. The application that needs to > operate on every token of the expanded file can call the wrapper rather than > using TokenBuffers directly. > > But... The most dominant application in clangd's case is SelectionTree and if > it is going to perform this computation all the time (because it wants to > support both batch and single invocation use-case :/ ). As you suggested this > can be optional (or triggered with an explicit endpoint) but that's gonna be > ugly, as it needs to be supplied when building a ParsedAST, ugh.. > > (Sorry for drawing a little circle here) I suppose it is cleanest the way you > propose it then (unless your application is actually building ParsedAST's > itself without any clangd abstraction like TUScheduler/ClangdServer). > It would still be great if you could check for possible regressions on the > common path (the easiest would be to just look at `ParsedAST::build` > latencies on a couple files, i suppose > `clang/lib/Serialization/ASTReader.cpp` is a good example for the > pathological case, winning there would probably mean winning in all the other > cases too, but it would be good to see a more "sane" file too. As anything > after building the `ParsedAST` is guaranteed to be as fast as before. > > > Although users which only use this for non-token ranges will pay additional > > cost. We can potentially make this precomputation optional for such cases. > > I suppose to hit the slow case users should provide spelled tokens that did > not expand to anything, which should be rare, so I don't think that'll be a > huge problem. Long thread here :-) Kadir: one wrinkle that's maybe missed here is that clangd's SelectionTree calls this method *many* times to establish bounds of many AST nodes. I fully believe this is dominating SelectionTree build time (though it would be nice to see benchmarks). ================ Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:653 + // Index ExpandedTokens for faster lookups by SourceLocation. + unsigned ExpandedIndex = 0; ---------------- I think it's worth reserve()ing the map size here Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D99086/new/ https://reviews.llvm.org/D99086 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits