rmuir commented on PR #12089: URL: https://github.com/apache/lucene/pull/12089#issuecomment-1416891188
> Anyway, back to the point about complexity vs. benefit, I 100% agree that relying on `IndexOrDocValues` would be preferable if we can solve for cost over-estimating. I'd pursued this path after some feedback from @jpountz ([#11741 (comment)](https://github.com/apache/lucene/pull/11741#issuecomment-1241681411)) that it may make sense to take this type of approach, but I'm all for keeping it simple if we can here. I'll keep an eye out for your draft PR. Thanks for the engagement on this! Yeah overall my concern is not so much with your specific query, just that it doesn't scale to many other queries. It is only matter of time before someone says "hey, we can really speed up many other slow use-cases by adding KeywordField.newPrefixQuery, newWildCardQuery, etc by using the docvalues", and it's true (for selective queries, you could avoid intersection against any large terms dict at all and do a a couple per-doc very-high-cost lookupTerm + runautomaton match). So it's an example where i'd like to avoid a mess: it would be better to fix apis such as MultiTermQuery, IndexOrDocValuesQuery, ScorerSupplier, etc so that we can make things work cleanly with simpler queries that are easier to test correctness of and maintain. But practically, right now benchmark is not very fair comparison because some of these queries just have obvious straightforward performance deficiencies, DocValuesTermsQuery especially. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org