[GitHub] [lucene] rmuir commented on pull request #12089: Modify TermInSetQuery to "self optimize" if doc values are available

via GitHub Sat, 04 Feb 2023 17:05:35 -0800


rmuir commented on PR #12089:
URL: https://github.com/apache/lucene/pull/12089#issuecomment-1416891188


   > Anyway, back to the point about complexity vs. benefit, I 100% agree that 
relying on `IndexOrDocValues` would be preferable if we can solve for cost 
over-estimating. I'd pursued this path after some feedback from @jpountz 
([#11741 
(comment)](https://github.com/apache/lucene/pull/11741#issuecomment-1241681411))
 that it may make sense to take this type of approach, but I'm all for keeping 
it simple if we can here. I'll keep an eye out for your draft PR. Thanks for 
the engagement on this!
   
   Yeah overall my concern is not so much with your specific query, just that 
it doesn't scale to many other queries. It is only matter of time before 
someone says "hey, we can really speed up many other slow use-cases by adding 
KeywordField.newPrefixQuery, newWildCardQuery, etc by using the docvalues", and 
it's true (for selective queries, you could avoid intersection against any 
large terms dict at all and do a a couple per-doc very-high-cost lookupTerm + 
runautomaton match). So it's an example where i'd like to avoid a mess: it 
would be better to fix apis such as MultiTermQuery, IndexOrDocValuesQuery, 
ScorerSupplier, etc so that we can make things work cleanly with simpler 
queries that are easier to test correctness of and maintain.
   
   But practically, right now benchmark is not very fair comparison because 
some of these queries just have obvious straightforward performance 
deficiencies, DocValuesTermsQuery especially.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #12089: Modify TermInSetQuery to "self optimize" if doc values are available

Reply via email to