[ 
https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348046#comment-17348046
 ] 

Zach Chen commented on LUCENE-9335:
-----------------------------------

{quote}Actually this matches my expectation. BMM and BMW differ in that BMM 
only makes a decision about which scorers lead iteration once per block, while 
BMW needs to make decisions on every document. So BMM collects more documents 
than BMW but BMW takes the risk that trying to be too smart makes things slower 
than a simpler approach.
{quote}
Ok I also took a further look at the TopDocsCollector code, and confirmed that 
I had an incorrect understanding of "collect" and "hit count" here earlier. 
This (and Michael's earlier response) totally makes sense now!
{quote}Yes. You can download the "Collection" and "Queries" files from 
[https://microsoft.github.io/msmarco/#ranking] (make sure to accept terms at 
the top first so that download links are active).
{quote}
Thanks! I was able to download them. Will explore a bit more to see how they 
can be improved further.

> Add a bulk scorer for disjunctions that does dynamic pruning
> ------------------------------------------------------------
>
>                 Key: LUCENE-9335
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9335
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: MSMarcoPassages.java, wikimedium.10M.nostopwords.tasks, 
> wikimedium.10M.nostopwords.tasks.5OrMeds
>
>          Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Lucene often gets benchmarked against other engines, e.g. against Tantivy and 
> PISA at [https://tantivy-search.github.io/bench/] or against research 
> prototypes in Table 1 of 
> [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf].
>  Given that top-level disjunctions of term queries are commonly used for 
> benchmarking, it would be nice to optimize this case a bit more, I suspect 
> that we could make fewer per-document decisions by implementing a BulkScorer 
> instead of a Scorer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to