benchaplin commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1838595160
########## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ########## @@ -2746,6 +2769,84 @@ public static Status.VectorValuesStatus testVectors( return status; } + private static HnswGraph getHnswGraph(CodecReader reader) throws IOException { + KnnVectorsReader vectorsReader = reader.getVectorReader(); + if (vectorsReader instanceof PerFieldKnnVectorsFormat.FieldsReader) { + vectorsReader = ((PerFieldKnnVectorsFormat.FieldsReader) vectorsReader).getFieldReader("knn"); Review Comment: Thanks, I didn't quite understand fields when I wrote this - I think I get it now. Alright, I've done what you suggested (as is also done in `testVectors`) and iterated over `FieldInfos`, performing the check only when it applies. Because we might now parse several HNSW graphs, I've restructured the status object to support per-graph data. Successful output will now look like: ``` test: open reader.........OK [took 0.010 sec] test: check integrity.....OK [took 2.216 sec] test: check live docs.....OK [took 0.000 sec] test: field infos.........OK [2 fields] [took 0.000 sec] test: field norms.........OK [0 fields] [took 0.000 sec] test: terms, freq, prox... test: stored fields.......OK [1500000 total field count; avg 1.0 fields per doc] [took 0.390 sec] test: term vectors........OK [0 total term vector count; avg 0.0 term/freq vector fields per doc] [took 0.000 sec] test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET; 0 SKIPPING INDEX] [took 0.000 sec] test: points..............OK [0 fields, 0 points] [took 0.000 sec] test: vectors.............OK [1 fields, 1500000 vectors] [took 0.496 sec] test: hnsw graphs.........OK [2 fields: (field name: knn1, levels: 4, total nodes: 1547684), (field name: knn2, levels: 4, total nodes: 1547684)] [took 0.979 sec] ``` `testVectors` doesn't do this, it just sums vectors over all fields. I could do that too, but this felt most complete. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org