zacharymorn commented on PR #972: URL: https://github.com/apache/lucene/pull/972#issuecomment-1168197720
> > I feel the effect would be similar? > > Indeed, sorry I had misread your code! > No worry, thanks still for the suggestion! > > No, it shouldn't matter. Bulk scorers sometimes help yield better performance because it's easier for them to amortize computation across docs, but if they don't yield better performance, there's no point in using a bulk scorer instead of a regular scorer. Ok I see, makes sense. > I agree that it looks like a great speedup, we should get this in! The benchmark only tests performance of top-level disjunctions of term queries that have two clauses. I'd be curious to get performance numbers for queries like the below ones to see if we need to fine-tune a bit more when this new scorer gets used. Note that I don't think we need to get the performance better for all these queries to merge the change, we could start by only using this new scorer for the (common) case of a top-level disjunction of 2 term queries, and later see if this scorer can handle more disjunctions. > > ``` > OrAndHigMedAndHighMed: (+including +looking) (+date +finished) # disjunction of conjunctions, which don't have as good score upper bounds as term queries > OrHighPhraseHighPhrase: "united states" "new york" # disjunction of phrase queries, which don't have as good score upper bounds as term queries and are slow to advance > AndHighOrMedMed: +be +(mostly interview) # disjunction within conjunction that leads iteration > AndMedOrHighHigh: +interview +(at united) # disjunction within conjunction that doesn't lead iteration > ``` Sounds good! I have run these queries through benchmark and the results look somewhat consistent: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighPhraseHighPhrase 28.89 (8.7%) 24.19 (4.7%) -16.3% ( -27% - -3%) 0.000 AndHighOrMedMed 101.24 (6.6%) 101.09 (3.0%) -0.1% ( -9% - 10%) 0.927 AndMedOrHighHigh 81.44 (6.3%) 81.62 (3.7%) 0.2% ( -9% - 10%) 0.895 OrAndHigMedAndHighMed 128.26 (7.0%) 136.94 (3.7%) 6.8% ( -3% - 18%) 0.000 PKLookup 221.47 (11.7%) 236.93 (9.1%) 7.0% ( -12% - 31%) 0.035 ``` ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighPhraseHighPhrase 27.73 (9.1%) 23.73 (4.6%) -14.4% ( -25% - 0%) 0.000 AndHighOrMedMed 97.09 (13.1%) 99.30 (4.3%) 2.3% ( -13% - 22%) 0.462 AndMedOrHighHigh 75.87 (15.2%) 80.04 (5.7%) 5.5% ( -13% - 31%) 0.128 PKLookup 219.70 (15.7%) 238.75 (12.4%) 8.7% ( -16% - 43%) 0.053 OrAndHigMedAndHighMed 121.83 (13.7%) 134.79 (4.4%) 10.6% ( -6% - 33%) 0.001 ``` ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighPhraseHighPhrase 27.42 (16.2%) 23.99 (4.0%) -12.5% ( -28% - 9%) 0.001 AndHighOrMedMed 96.61 (15.8%) 100.09 (3.6%) 3.6% ( -13% - 27%) 0.321 AndMedOrHighHigh 75.72 (16.8%) 79.53 (4.9%) 5.0% ( -14% - 32%) 0.200 OrAndHigMedAndHighMed 122.33 (16.9%) 136.60 (4.5%) 11.7% ( -8% - 39%) 0.003 PKLookup 207.94 (21.6%) 233.10 (16.5%) 12.1% ( -21% - 63%) 0.046 ``` Looks like we may need to restrict the scorer to only term queries, or improve it for phrase queries? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org