[ https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337948#comment-17337948 ]
Zach Chen commented on LUCENE-9335: ----------------------------------- I was trying to modify the _CreateQueries_ class in luceneutil to generate OR queries with 5 clauses, but got some issues running it. So I did some quick hack to combine the queries from OrHighHigh, OrHighMed and OrHighLow to create a new OrHighHighMedHighLow task with queries. I've attached the resulting file _wikimedium.10M.nostopwords.tasks_ to this ticket. Here are the luceneutil results from 2 runs for each implementation: Scorer [https://github.com/apache/lucene/pull/101] {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighHighMedHighLow 30.97 (6.2%) 24.92 (4.4%) -19.5% ( -28% - -9%) 0.000 PKLookup 223.53 (2.4%) 228.10 (3.7%) 2.0% ( -3% - 8%) 0.037{code} {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighHighMedHighLow 32.83 (3.4%) 34.00 (5.1%) 3.6% ( -4% - 12%) 0.009 PKLookup 217.86 (2.8%) 228.14 (4.2%) 4.7% ( -2% - 12%) 0.000 {code} BulkScorer [https://github.com/apache/lucene/pull/113|https://github.com/apache/lucene/pull/113.] {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value PKLookup 197.84 (4.1%) 207.79 (4.2%) 5.0% ( -3% - 13%) 0.000 OrHighHighMedHighLow 32.50 (16.7%) 35.79 (9.9%) 10.1% ( -14% - 44%) 0.020 {code} {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighHighMedHighLow 28.61 (5.4%) 22.28 (4.2%) -22.1% ( -30% - -13%) 0.000 PKLookup 227.38 (2.6%) 233.05 (2.7%) 2.5% ( -2% - 8%) 0.003 {code} > Add a bulk scorer for disjunctions that does dynamic pruning > ------------------------------------------------------------ > > Key: LUCENE-9335 > URL: https://issues.apache.org/jira/browse/LUCENE-9335 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Attachments: wikimedium.10M.nostopwords.tasks > > Time Spent: 2.5h > Remaining Estimate: 0h > > Lucene often gets benchmarked against other engines, e.g. against Tantivy and > PISA at [https://tantivy-search.github.io/bench/] or against research > prototypes in Table 1 of > [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf]. > Given that top-level disjunctions of term queries are commonly used for > benchmarking, it would be nice to optimize this case a bit more, I suspect > that we could make fewer per-document decisions by implementing a BulkScorer > instead of a Scorer. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org