jtibshirani commented on code in PR #11790:
URL: https://github.com/apache/lucene/pull/11790#discussion_r975678640


##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##########
@@ -267,6 +267,9 @@ private NeighborQueue searchLevel(
     while (results.size() > topK) {
       results.pop();
     }
+    if (level == 0 && results.size() < topK && numVisited < size) {

Review Comment:
   I'm feeling unsure about this check -- I'm having trouble understanding the 
general principle. First, it only applies when a filter is present, but it 
seems like we could run into the problem even without filtering (in a highly 
disconnected graph?)
   
   Also, could it sometimes cause surprisingly slow searches? Let's say you had 
a filter that matched 90% of documents. Because the graph is disconnected, you 
return find fewer than `topK` matches. Then you fall back to an exact scan, but 
this needs to consider 90% of the index!
   
   For me it'd be better to just remove this test (as we chatted about in 
https://github.com/apache/lucene/issues/11787), since we generally don't have 
proper handling for poorly connected graphs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to