[ https://issues.apache.org/jira/browse/LUCENE-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539583#comment-17539583 ]
Michael Sokolov commented on LUCENE-10559: ------------------------------------------ I think it makes sense to use a fixed bit set so that we can test HNSW performance with filtering independently from the cost of the filter Query. I think your test seems to be demonstrating that for similar latencies (~cost) we can achieve significantly higher recall with pre-filtering? I wonder if we could also demonstrate the converse -- what effective topK is required when post-filtering to drive recall to be the same as pre-filtering? Also, these recall numbers seem curiously high, higher than we usually see. Could you publish the graph construction and HNSW search time parameters you used? I'm also wondering whether perhaps you tested with vectors from the training set? > Add preFilter/postFilter options to KnnGraphTester > -------------------------------------------------- > > Key: LUCENE-10559 > URL: https://issues.apache.org/jira/browse/LUCENE-10559 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael Sokolov > Priority: Major > > We want to be able to test the efficacy of pre-filtering in KnnVectorQuery: > if you (say) want the top K nearest neighbors subject to a constraint Q, are > you better off over-selecting (say 2K) top hits and *then* filtering > (post-filtering), or incorporating the filtering into the query > (pre-filtering). How does it depend on the selectivity of the filter? > I think we can get a reasonable testbed by generating a uniform random filter > with some selectivity (that is consistent and repeatable). Possibly we'd also > want to try filters that are correlated with index order, but it seems they'd > be unlikely to be correlated with vector values in a way that the graph > structure would notice, so random is a pretty good starting point for this. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org