mikemccand commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1837944575
########## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ########## @@ -406,6 +411,21 @@ public static final class VectorValuesStatus { public Throwable error; } + /** Status from testing HNSW graph */ + public static final class HnswGraphStatus { + + HnswGraphStatus() {} + + /** Number of nodes in the graph */ + public int hnswGraphSize; + + /** Number of levels of the graph */ + public int hsnwGraphNumLevels; Review Comment: `graphNumLevels`? Naming is the hardest part! ########## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ########## @@ -406,6 +411,21 @@ public static final class VectorValuesStatus { public Throwable error; } + /** Status from testing HNSW graph */ + public static final class HnswGraphStatus { + + HnswGraphStatus() {} + + /** Number of nodes in the graph */ + public int hnswGraphSize; Review Comment: Maybe rename to `graphNumNodes` or so? `size` is a bit ambiguous/tricky, and since `hnsw` is in the class name we can prolly leave it off the member name? ########## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ########## @@ -2746,6 +2769,84 @@ public static Status.VectorValuesStatus testVectors( return status; } + private static HnswGraph getHnswGraph(CodecReader reader) throws IOException { + KnnVectorsReader vectorsReader = reader.getVectorReader(); + if (vectorsReader instanceof PerFieldKnnVectorsFormat.FieldsReader) { + vectorsReader = ((PerFieldKnnVectorsFormat.FieldsReader) vectorsReader).getFieldReader("knn"); Review Comment: Hmm the field can be anything (not necessarily `knn` like `luceneutil` uses)? Can we change this so we iterate `FieldInfos` and for any field that has KNN vectors enabled (`FieldInfo.hasVectorValues`, hmm but also has the HNSW graph (since I think a field can be only "flat" vectors?)) we do this check? ########## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ########## @@ -2746,6 +2769,84 @@ public static Status.VectorValuesStatus testVectors( return status; } + private static HnswGraph getHnswGraph(CodecReader reader) throws IOException { + KnnVectorsReader vectorsReader = reader.getVectorReader(); + if (vectorsReader instanceof PerFieldKnnVectorsFormat.FieldsReader) { + vectorsReader = ((PerFieldKnnVectorsFormat.FieldsReader) vectorsReader).getFieldReader("knn"); + if (vectorsReader instanceof HnswGraphProvider) { + return ((HnswGraphProvider) vectorsReader).getGraph("knn"); + } + } + return null; + } + + /** Test the HNSW graph. */ + public static Status.HnswGraphStatus testHnswGraph( + CodecReader reader, PrintStream infoStream, boolean failFast) throws IOException { + if (infoStream != null) { + infoStream.print(" test: hnsw graph.........."); + } + long startNS = System.nanoTime(); + Status.HnswGraphStatus status = new Status.HnswGraphStatus(); + + try { + HnswGraph hnswGraph = getHnswGraph(reader); + if (hnswGraph != null) { + final int numLevels = hnswGraph.numLevels(); + // Perform tests on each level of the HNSW graph + for (int level = numLevels - 1; level >= 0; level--) { + HnswGraph.NodesIterator nodesIterator = hnswGraph.getNodesOnLevel(level); + while (nodesIterator.hasNext()) { + int node = nodesIterator.nextInt(); + hnswGraph.seek(level, node); + int nbr, lastNeighbor = -1, firstNeighbor = -1; + while ((nbr = hnswGraph.nextNeighbor()) != NO_MORE_DOCS) { + if (firstNeighbor == -1) { + firstNeighbor = nbr; + } + if (nbr < lastNeighbor) { + throw new CheckIndexException( + "Neighbors out of order for node " + + node + + ": " + + nbr + + "<" + + lastNeighbor + + " 1st=" + + firstNeighbor); + } else if (nbr == lastNeighbor) { + throw new CheckIndexException( + "There are repeated neighbors of node " + node + " with value " + nbr); + } + lastNeighbor = nbr; + } + status.hnswGraphSize++; + } + status.hsnwGraphNumLevels++; Review Comment: Instead of incrementing here, you could just set it to `numLevels` from above? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org