javanna commented on PR #13806:
URL: https://github.com/apache/lucene/pull/13806#issuecomment-2360398761

   Thanks for taking a look @rmuir !
   
   I have been digging a bit through history, it seems like it used to be 
possible to get all the terms via `QueryVisitor#consumeTerms`, but that changed 
with 
https://github.com/apache/lucene/commit/267d70b66b6ac30db1d48f131b294512420f468c
 to build an automaton instead. That effectively removed the ability to get the 
actual terms from `TermInSetQuery` via query visitor. Later, `getTermData` was 
also removed from `TermInSetQuery`, as it exposed internal encoding of the 
terms. I agree that we should look at how to make the API better in the long 
run. 
   
   The main usecase we have is monitor alike, where you have queries stored in 
the index, and want to run them against incoming documents. In order to 
pre-filter the queries and reduce the amount of them that we need to run, we 
extract terms from them upon indexing and put them in a separate field that we 
later use to apply pre-filtering. I checked in the monitor code and it looks 
like its query visitor (from `QueryAnalyzer`) ignores this case, hence 
`TermInSetQuery` with more than one term will not allow to perform 
pre-filtering, which means that more queries will be potential candidates that 
could be excluded ahead of time if we were able to extract terms from them. I 
am not sure that the predicate idea would work for this scenario, I don't think 
we can move away from building a list of terms for this usecase?
   
   With that, I am going to go ahead and merge this PR for now to main, that 
unblocks us for now, and we can continue the discussion about the long term 
approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to