[ https://issues.apache.org/jira/browse/LUCENE-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054501#comment-17054501 ]
Michael Sokolov commented on LUCENE-8929: ----------------------------------------- I posted a new revision that switches between max/min-based termination (what we had before this), and this min/min(?) termination for higher N (`numHits`) and we now get uniformly better, or the the same, results on benchmarks. Actually I find the "max/min" terminology pretty confusing since in fact sorting is generally *increasing* so we are really interesting in min/max and max/max, so I tried to use "worst" score in most places to avoid this confusion. Anyway here are the updated results: ## N=20 || Task ||QPS before|| StdDev|| QPS after|| StdDev|| Pct diff|| | LowTermDayOfYearSort| 610.73 |(1.5%)| 609.69 |(1.0%) |-0.2% ( -2% - 2%)| | HighTermDayOfYearSort| 1791.55 |(2.1%)| 1814.44 |(3.0%)| 1.3% ( -3% - 6%)| ## N=100 || Task ||QPS before|| StdDev|| QPS after|| StdDev|| Pct diff|| |LowTermDayOfYearSort | 568.79| (2.2%) | 588.81| (0.5%) |3.5% ( 0% - 6%)| |HighTermDayOfYearSort| 1431.30| (12.4%)| 1664.18| (9.6%)| 16.3% ( -5% - 43%)| ## N=500 || Task ||QPS before|| StdDev|| QPS after|| StdDev|| Pct diff|| |LowTermDayOfYearSort| 386.90 |(5.0%) |585.41| (6.0%) |51.3% ( 38% - 65%)| |HighTermDayOfYearSort | 482.69 |(7.7%) |1017.13| (30.5%) | 110.7% ( 67% - 161%)| ## N=1000 || Task ||QPS before|| StdDev|| QPS after|| StdDev|| Pct diff|| | LowTermDayOfYearSort | 243.90| (3.1%) | 547.16 |(12.1%) |124.3% ( 105% - 144%)| | HighTermDayOfYearSort| 272.67| (3.4%)| 1041.77 | (33.4%)| 282.1% ( 237% - 330%)| > Early Terminating CollectorManager > ---------------------------------- > > Key: LUCENE-8929 > URL: https://issues.apache.org/jira/browse/LUCENE-8929 > Project: Lucene - Core > Issue Type: Sub-task > Reporter: Atri Sharma > Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > We should have an early terminating collector manager which accurately tracks > hits across all of its collectors and determines when there are enough hits, > allowing all the collectors to abort. > The options for the same are: > 1) Shared total count : Global "scoreboard" where all collectors update their > current hit count. At the end of each document's collection, collector checks > if N > threshold, and aborts if true > 2) State Reporting Collectors: Collectors report their total number of counts > collected periodically using a callback mechanism, and get a proceed or abort > decision. > 1) has the overhead of synchronization in the hot path, 2) can collect > unnecessary hits before aborting. > I am planning to work on 2), unless objections -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org