benchaplin commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2566774509
I've added @msokolov's node-in-range check + neighbor-on-same-level check, and @tteofili's connectivity check. I chose not to throw an exception for disconnectedness, as I've found there's often a couple disconnected nodes at level 0 (at least in my indices). Maybe we could throw if connectivity is below some threshold? So overall, exceptions are thrown if: - A node is not in the range [0, graphSize - 1] - A node's neighbor is not on the same level as the node - A node's neighbors are out of order - A node's neighbors contain duplicates I implemented a more detailed report in the logs: node counts and connectivity are printed for each level of the graph, e.g. ``` test: open reader.........OK [took 0.012 sec] test: check integrity.....OK [took 0.489 sec] test: check live docs.....OK [took 0.000 sec] test: field infos.........OK [2 fields] [took 0.000 sec] test: field norms.........OK [0 fields] [took 0.000 sec] test: terms, freq, prox... test: stored fields.......OK [200000 total field count; avg 1.0 fields per doc] [took 0.055 sec] test: term vectors........OK [0 total term vector count; avg 0.0 term/freq vector fields per doc] [took 0.000 sec] test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET; 0 SKIPPING INDEX] [took 0.000 sec] test: points..............OK [0 fields, 0 points] [took 0.000 sec] test: vectors.............OK [1 fields, 200000 vectors] [took 0.111 sec] test: hnsw graphs.........OK [1 fields] [took 0.317 sec] hnsw field name: knn level 3: 11 nodes, 11/11 connected level 2: 141 nodes, 141/141 connected level 1: 6022 nodes, 6022/6022 connected level 0: 200000 nodes, 199999/200000 connected ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org