benwtrent commented on PR #14160: URL: https://github.com/apache/lucene/pull/14160#issuecomment-2631840332
> I think this 'correlation' is important to test as I imagine many real world filters involve some correlation, rather than the random filters we get in luceneutil benchmarks. I agree, however, random is also generally useful for: - Folks indexing multiple client data into the same graph (common for hosted multi-tenant) - Filtering by timestamp - Any amount of deleted docs (deletes are "filters"). But, I eagerly await your results. I am going to refactor this assuming we just always have it on at a given threshold (I am leaning towards 60% allowed vectors or lower as being the threshold). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org