[GitHub] [lucene] gsmiller commented on pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

via GitHub Fri, 27 Jan 2023 22:26:03 -0800


gsmiller commented on PR #12089:
URL: https://github.com/apache/lucene/pull/12089#issuecomment-1407307849


   > attach it to a github comment
   
   That works! Here's how I benchmarked. One note if you're interested in 
running this is to make sure to shuffle the genomes data prior to running or 
you get very skewed results. This is because the file is sorted by country 
code, so a postings-based approach is heavily favored by the natural index 
sorting if you index the lines in this order.
   [TiSBench.txt](https://github.com/apache/lucene/files/10526083/TiSBench.txt)
   
   
   I'm actually a bit surprised/impressed that our existing `IndexOrDocValues` 
functionality works as well as it does across these queries, given how rough of 
an estimate `#cost()` is on the current `TermInSetQuery`. There are some clear 
cases where term-seeking and using term-level stats in a heuristic helps make 
better decisions, but I expected the difference to be more pronounced.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

Reply via email to