jpountz commented on PR #12444:
URL: https://github.com/apache/lucene/pull/12444#issuecomment-1637854931

   Here is a similar table as above but with low-cardinality clauses instead of 
high-cardinality clauses in order to show how the overhead of the bitset 
manifests:
   
   ```
   OrLow2: rivers sequence
   OrLow3: rivers sequence opposite
   OrLow4: rivers sequence opposite aug
   OrLow6: rivers sequence opposite aug ross bronze
   OrLow8: rivers sequence opposite aug ross bronze extension factor
   OrLow12: rivers sequence opposite aug ross bronze extension factor migration 
maintained norwegian visited
   OrLow16: rivers sequence opposite aug ross bronze extension factor migration 
maintained norwegian visited korean argentina developing billion
   ```
   
   | Task | BooleanScorer | WANDScorer within DefaultBulkScorer | 
MaxScoreBulkScorer (main) | MaxScoreBulkScorer (patch) |
   | -- | -- | -- | -- | -- |
   | OrLow2 | 283.3 | 353.0 | 427.2 🔶 | 398.1 🔷 |
   | OrLow3 | 210.3 | 278.6 🔶 | 270.0 | 220.1 🔷 |
   | OrLow4 | 171.7 | 198.3 🔶 | 190.0 |163.5 🔷 |
   | OrLow6 | 124.5 | 114.7 🔶 | 112.3 | 108.5 🔷 |
   | OrLow8 | 97.3 | 77.5 🔶 | 77.1 | 81.6 🔷 |
   | OrLow12 | 68.2 | 44.7 🔶 | 50.1 | 56.5 🔷 |
   | OrLow16 | 52.3 | 31.1 🔶 | 36.0 | 42.6 🔷 |
   
   With high-frequency clauses, `MaxScoreBulkScorer` was consistenly better in 
this PR than in the main branch. With low-frequency clauses, it's now only true 
for queries with 8 clauses or more. Also WAND performs faster than MAXSCORE 
here with less than 8 clauses.
   
   I'd like to avoid trying to go too far wrt picking the optimal 
implementation based on the query, which could get quite messy. Maybe we could 
introduce simple heuristics in a follow-up, such as only using the bulk scorer 
if the cost is high enough that we'd expect more than X matches per 2048-bits 
window on average.
   
   In general, this new `MaxScoreBulkScorer` feels like the best option to me, 
as it performs better on the slower queries that have high-frequency clauses, 
and its performance degrades more gracefully when the number of clauses 
increases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to