sammccall added a comment.

Thanks for finding and working on this hotspot. (I'm curious *why* 
`isBeforeInTranslationUnit` is slow if you have any insight - it has sad 
worst-case but I thought we'd hit the one-element cache often enough in 
practice).

I don't see a better way to optimize this, main question is where to do it.
In favor of TokenBuffer initializer (this patch):

- it produces the simplest API
- no code duplicated between callers
- when a TokenBuffer gets reused and it's awkward for its users to share other 
state, the same cache can be reused

In favor of in the caller (e.g. SelectionTree)

- users that never call expandedTokens(SourceRange) pay nothing
- users that call expandedTokens(SourceRange) once pay `log(N) * 
isBeforeInTranslationUnit` instead of `N`, which may come out ahead for large 
files.

It's a bit unfortunate that TokenBuffer is enough of a grab-bag of features 
that we can't really reason about which structures to build :-(

Can we get the best of both worlds?
Ideally we'd build this cache only when it's profitable. But building on-demand 
messes with const (and it's unclear whether to do it on the first call).
What about making this optional via a non-const 
`TokenBuffer::indexExpandedTokens(); // Optimizes future calls to 
expandedTokens(SourceRange). Idempotent.`?
It's a slightly weird API, but not actually hard to use (I imagine clangd would 
always call it when building ParsedAST).



================
Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:372
+  // useful while finding expanded tokens in a 'token range'.
+  llvm::DenseMap<SourceLocation, unsigned> ExpandedTokIndex;
   llvm::DenseMap<FileID, MarkedFile> Files;
----------------
kadircet wrote:
> usaxena95 wrote:
> > kadircet wrote:
> > > this definitely makes sense, but most of the users of `TokenBuffers` are 
> > > currently invoking `expandedTokens(Range)` only once, so i am not sure if 
> > > this is a huge benefit on their-side, whereas they'll likely end up 
> > > seeing some short-lived memory consumption (and an extra traversal of all 
> > > the expanded tokens).
> > > 
> > > Any reasons for not moving this logic to the application? 
> > > `expandedTokens()` endpoint will return all the tokens, no questions 
> > > asked. Then the application can build whatever acceleration structure it 
> > > wants. If you think there are in-tree use cases that can benefit from 
> > > this caching, maybe even provide that as a separate helper?
> > > this definitely makes sense, but most of the users of TokenBuffers are 
> > > currently invoking expandedTokens(Range) only once
> > I didn't get that users are invoking it only once. This is not caching the 
> > results of `expandedTokens(Range)`. We are precomputing these to completely 
> > avoid a binary search which uses `isBeforeInTranslationUnit` (considerably 
> > slower). Doing an extra traversal of expandedTokens is not noticeable as 
> > compared to `isBeforeInTranslationUnit` latency. Users like SelectionTree 
> > (GoToDefinition, Hover,..) can directly benefit from this even though they 
> > query this only once per some range.
> > 
> > Umm. If you meant use cases like: build token buffer -> use 
> > expandedTokens(Range) -> destroy TokenBuffer.
> > I don't immediately see such use cases. But I think it is still better than 
> > using isBeforeInTranslationUnit.
> > My mental model is that TokenBuffer is a long lived object (example in 
> > ParsedAST in clangd). And using it only once and then discarding it is 
> > expensive in itself.
> > 
> > > Any reasons for not moving this logic to the application?
> > > If you think there are in-tree use cases that can benefit from this 
> > > caching, maybe even provide that as a separate helper?
> > This can basically benefit all users of `expandedToken(Range)`. Since 
> > expandedToken(Range) is provided by TokenBuffer, the optimization must also 
> > remain in the TokenBuffer as compared to the application.
> > Although users which only use this for non-token ranges will pay additional 
> > cost. We can potentially make this precomputation optional for such cases.
> > 
> > 
> > 
> > I didn't get that users are invoking it only once.
> 
> Sorry that was confusing, I was trying to say that on use cases like 
> Hover/Go-To etc. clangd builds selection tree only once and you are right, 
> that might actually involve multiple calls to `expandedTokens(Range)` but 
> that shouldn't be the bottleneck of these features.
> 
> 
> > This is not caching the results of `expandedTokens(Range)`. We are 
> > precomputing these to completely avoid a binary search
> 
> right, i was saying that all that precomputation is also probably unneeded, 
> as normally features are only interested in a single token, which should nest 
> properly in AST under normal circumstances and doesn't require calling 
> expandedTokens for all possible ranges.
> 
> > Doing an extra traversal of expandedTokens is not noticeable as compared to 
> > `isBeforeInTranslationUnit` latency. Users like SelectionTree 
> > (GoToDefinition, Hover,..) can directly benefit from this even though they 
> > query this only once per some range.
> 
> I was suggesting to move this into application because that felt 
> counter-intuitive, as the benchmark you provided is actually running 
> Hover(and other feautres) on every (spelled) token of the file. If you say 
> that it is beneficial even for the normal hover case, sure let's go (but i'd 
> be happier if you proved me wrong first 😅 )
> 
> > Umm. If you meant use cases like: build token buffer -> use 
> > expandedTokens(Range) -> destroy TokenBuffer.
> > I don't immediately see such use cases. But I think it is still better than 
> > using isBeforeInTranslationUnit.
> > My mental model is that TokenBuffer is a long lived object (example in 
> > ParsedAST in clangd). And using it only once and then discarding it is 
> > expensive in itself.
> 
> That's what clangd does though (not sure if Tokenbuffers have any other users 
> ATM). It literally builds a new ParsedAST after every edit(It also has a 
> cache to skip some work if contents didn't change between requests). But 
> usually all of those requests are user-initiated, so an extra latency of a 
> couple ms on every request isn't really noticeable.
> Getting rid of that would still be great, but if it means slowing down the 
> initial request for quite some time (due to extra traversal of all the 
> expanded tokens, which might be a huge deal especially on files including 
> some tablegen'd stuff, as they are usually part of the main file and not 
> preamble).
> 
> > This can basically benefit all users of `expandedToken(Range)`.
> 
> As stated above, i am just worried that this might hurt the use cases in 
> which application just builds a tokenbuffer and calls this on a few ranges. 
> (maybe clangd is the odd one out, but i don't see any other users. there's 
> `clang::syntax::computeReplacements` but it is only used in a test)
> 
> > Since expandedToken(Range) is provided by TokenBuffer, the optimization 
> > must also remain in the TokenBuffer as compared to the application.
> 
> I don't follow the reasoning here exactly. The application that needs to 
> operate on every token of the expanded file can call the wrapper rather than 
> using TokenBuffers directly.
> 
> But... The most dominant application in clangd's case is SelectionTree and if 
> it is going to perform this computation all the time (because it wants to 
> support both batch and single invocation use-case :/ ). As you suggested this 
> can be optional (or triggered with an explicit endpoint) but that's gonna be 
> ugly, as it needs to be supplied when building a ParsedAST, ugh..
> 
> (Sorry for drawing a little circle here) I suppose it is cleanest the way you 
> propose it then (unless your application is actually building ParsedAST's 
> itself without any clangd abstraction like TUScheduler/ClangdServer).
> It would still be great if you could check for possible regressions on the 
> common path (the easiest would be to just look at `ParsedAST::build` 
> latencies on a couple files, i suppose 
> `clang/lib/Serialization/ASTReader.cpp` is a good example for the 
> pathological case, winning there would probably mean winning in all the other 
> cases too, but it would be good to see a more "sane" file too. As anything 
> after building the `ParsedAST` is guaranteed to be as fast as before.
> 
> > Although users which only use this for non-token ranges will pay additional 
> > cost. We can potentially make this precomputation optional for such cases.
> 
> I suppose to hit the slow case users should provide spelled tokens that did 
> not expand to anything, which should be rare, so I don't think that'll be a 
> huge problem.
Long thread here :-)

Kadir: one wrinkle that's maybe missed here is that clangd's SelectionTree 
calls this method *many* times to establish bounds of many AST nodes. I fully 
believe this is dominating SelectionTree build time (though it would be nice to 
see benchmarks).


================
Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:653
 
+    // Index ExpandedTokens for faster lookups by SourceLocation.
+    unsigned ExpandedIndex = 0;
----------------
I think it's worth reserve()ing the map size here


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99086/new/

https://reviews.llvm.org/D99086

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to