jtibshirani commented on code in PR #11790: URL: https://github.com/apache/lucene/pull/11790#discussion_r975678640
########## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ########## @@ -267,6 +267,9 @@ private NeighborQueue searchLevel( while (results.size() > topK) { results.pop(); } + if (level == 0 && results.size() < topK && numVisited < size) { Review Comment: I'm feeling unsure about this check -- I'm having trouble understanding the general principle. First, it only applies when a filter is present, but it seems like we could run into the problem even without filtering (in a highly disconnected graph?) Also, could it sometimes cause surprisingly slow searches? Let's say you had a filter that matched 90% of documents. Because the graph is disconnected, you return find fewer than `topK` matches. Then you fall back to an exact scan, but this needs to consider 90% of the index! For me it'd be better to just remove this test (as we chatted about in https://github.com/apache/lucene/issues/11787), since we generally don't have proper handling for poorly connected graphs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org