javanna commented on PR #13806: URL: https://github.com/apache/lucene/pull/13806#issuecomment-2360398761
Thanks for taking a look @rmuir ! I have been digging a bit through history, it seems like it used to be possible to get all the terms via `QueryVisitor#consumeTerms`, but that changed with https://github.com/apache/lucene/commit/267d70b66b6ac30db1d48f131b294512420f468c to build an automaton instead. That effectively removed the ability to get the actual terms from `TermInSetQuery` via query visitor. Later, `getTermData` was also removed from `TermInSetQuery`, as it exposed internal encoding of the terms. I agree that we should look at how to make the API better in the long run. The main usecase we have is monitor alike, where you have queries stored in the index, and want to run them against incoming documents. In order to pre-filter the queries and reduce the amount of them that we need to run, we extract terms from them upon indexing and put them in a separate field that we later use to apply pre-filtering. I checked in the monitor code and it looks like its query visitor (from `QueryAnalyzer`) ignores this case, hence `TermInSetQuery` with more than one term will not allow to perform pre-filtering, which means that more queries will be potential candidates that could be excluded ahead of time if we were able to extract terms from them. I am not sure that the predicate idea would work for this scenario, I don't think we can move away from building a list of terms for this usecase? With that, I am going to go ahead and merge this PR for now to main, that unblocks us for now, and we can continue the discussion about the long term approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org