[ https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347362#comment-17347362 ]
Zach Chen commented on LUCENE-9335: ----------------------------------- {quote}The speedup for some of the slower queries looks great. I know Fuzzy1 and Fuzzy2 are quite noisy, but have you tried running them using BMM? Maybe your change makes them faster? {quote} Ah not sure why I didn't think of running them through BMM earlier! I just gave them a run, and got the following results: *BMM Scorer* {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Fuzzy1 30.46 (24.7%) 17.63 (11.6%) -42.1% ( -62% - -7%) 0.000 Fuzzy2 21.61 (16.4%) 16.28 (12.0%) -24.7% ( -45% - 4%) 0.000 PKLookup 216.72 (4.1%) 215.63 (3.0%) -0.5% ( -7% - 6%) 0.654 {code} {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Fuzzy1 30.58 (9.1%) 22.12 (6.4%) -27.7% ( -39% - -13%) 0.000 Fuzzy2 36.07 (12.7%) 27.05 (10.8%) -25.0% ( -42% - -1%) 0.000 PKLookup 215.26 (3.4%) 213.99 (2.5%) -0.6% ( -6% - 5%) 0.530{code} *BMMBulkScorer without window (with the above scorer implementation)* {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Fuzzy2 16.32 (22.6%) 15.68 (16.3%) -3.9% ( -34% - 45%) 0.527 Fuzzy1 48.11 (17.6%) 47.48 (13.6%) -1.3% ( -27% - 36%) 0.791 PKLookup 213.67 (3.2%) 212.52 (4.0%) -0.5% ( -7% - 6%) 0.640 {code} {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Fuzzy2 26.99 (23.2%) 24.75 (13.6%) -8.3% ( -36% - 37%) 0.169 PKLookup 216.27 (4.3%) 216.43 (3.4%) 0.1% ( -7% - 8%) 0.951 Fuzzy1 19.01 (24.2%) 20.01 (14.2%) 5.3% ( -26% - 57%) 0.400 {code} *BMMBulkScorer with window size 1024* {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Fuzzy2 23.56 (26.0%) 19.08 (13.9%) -19.0% ( -46% - 28%) 0.004 Fuzzy1 30.97 (31.6%) 25.82 (16.9%) -16.6% ( -49% - 46%) 0.038 PKLookup 213.23 (2.5%) 211.63 (1.8%) -0.7% ( -5% - 3%) 0.289 {code} {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Fuzzy1 20.59 (12.1%) 20.59 (10.5%) -0.0% ( -20% - 25%) 0.994 PKLookup 205.21 (3.1%) 206.99 (3.7%) 0.9% ( -5% - 7%) 0.422 Fuzzy2 30.74 (22.7%) 32.71 (17.0%) 6.4% ( -27% - 59%) 0.311 {code} These results look strange to me actually, as I would imagine the BulkScorer without window one to perform similarly with the scorer one, as it was just using the scorer implementation under the hood. I'll need to dive into it more to understand what contributed to these difference (their JFR CPU recordings look similar too). >From the results I got now, it seems BMM may not be ideal for handling queries >with many terms. My high level guess is that with these queries that can be >rewritten into boolean queries with ~50 terms, BMM may find itself spending >lots of time to compute upTo and update maxScore, as the minimum of all block >boundaries of scorers were used to update upTo each time. This can explain why >the bulkScorer implementation with a fixed window size has better performance >than the scorer one, but doesn't explain the difference above. {quote}I wanted to do some more tests so I played with the MSMARCO passages dataset, which has the interesting property of having queries that have several terms (often around 8-10). See the attached benchmark if you are interested, here are the outputs I'm getting for various scorers: Contrary to my intuition, WAND seems to perform better despite the high number of terms. I wonder if there are some improvements we can still make to BMM? {quote} Thanks for running these additional tests! The results indeed look interesting. I took a look at the MSMarcoPassages.java code you attached, and wonder if it's also possible that, since the percentile numbers were computed after sort, for some low percentile (P10 for example) BMM can do much better, and for the rest (at least 50% of them), worse than BMW? I also notice that BMM BulkScorer collects roughly 10X the amount of docs compared with BMM scorer, which in turn also collects > 10X the amount of docs compared with BMW. I feel this may also explain the unexpected slow down? In general I would assume these scorers to all collect the same amount of top docs. Also, I'm interested to run these benchmark tests as well. Are these passages data set and queries used available for download somewhere (I found the MS github site, but not sure if that has the same version with the one you used)? > Add a bulk scorer for disjunctions that does dynamic pruning > ------------------------------------------------------------ > > Key: LUCENE-9335 > URL: https://issues.apache.org/jira/browse/LUCENE-9335 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Attachments: MSMarcoPassages.java, wikimedium.10M.nostopwords.tasks, > wikimedium.10M.nostopwords.tasks.5OrMeds > > Time Spent: 6h 50m > Remaining Estimate: 0h > > Lucene often gets benchmarked against other engines, e.g. against Tantivy and > PISA at [https://tantivy-search.github.io/bench/] or against research > prototypes in Table 1 of > [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf]. > Given that top-level disjunctions of term queries are commonly used for > benchmarking, it would be nice to optimize this case a bit more, I suspect > that we could make fewer per-document decisions by implementing a BulkScorer > instead of a Scorer. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org