[ https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436972#comment-17436972 ]
Greg Miller commented on LUCENE-10207: -------------------------------------- [~rcmuir] as a cost heuristic for running the term-based scorer, I agree that sumDocFreq() is a better fit than getDocCount(). But, I thought that {{ScorerSupplier#cost()}} was meant to estimate the number of docs the scorer would produce if leading iteration. Am I misunderstanding that? Thanks! > Make TermInSetQuery usable with IndexOrDocValuesQuery > ----------------------------------------------------- > > Key: LUCENE-10207 > URL: https://issues.apache.org/jira/browse/LUCENE-10207 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Attachments: LUCENE-10207_multitermquery.patch > > > IndexOrDocValuesQuery is very useful to pick the right execution mode for a > query depending on other bits of the query tree. > We would like to be able to use it to optimize execution of TermInSetQuery. > However IndexOrDocValuesQuery only works well if the "index" query can give > an estimation of the cost of the query without doing anything expensive (like > looking up all terms of the TermInSetQuery in the terms dict). Maybe we could > implement it for primary keys (terms.size() == sumDocFreq) by returning the > number of terms of the query? Another idea is to multiply the number of terms > by the average postings length, though this could be dangerous if the field > has a zipfian distribution and some terms have a much higher doc frequency > than the average. > [~romseygeek] and I were discussing this a few weeks ago, and more recently > [~mikemccand] and [~gsmiller] again independently. So it looks like there is > interest in this. Here is an email thread where this was recently discussed: > https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org