benchaplin commented on PR #13984:
URL: https://github.com/apache/lucene/pull/13984#issuecomment-2566774509

   I've added @msokolov's node-in-range check + neighbor-on-same-level check, 
and @tteofili's connectivity check. I chose not to throw an exception for 
disconnectedness, as I've found there's often a couple disconnected nodes at 
level 0 (at least in my indices). Maybe we could throw if connectivity is below 
some threshold? 
   
   So overall, exceptions are thrown if:
   - A node is not in the range [0, graphSize - 1]
   - A node's neighbor is not on the same level as the node
   - A node's neighbors are out of order 
   - A node's neighbors contain duplicates
   
   I implemented a more detailed report in the logs: node counts and 
connectivity are printed for each level of the graph, e.g.
   ```
   test: open reader.........OK [took 0.012 sec]
   test: check integrity.....OK [took 0.489 sec]
   test: check live docs.....OK [took 0.000 sec]
   test: field infos.........OK [2 fields] [took 0.000 sec]
   test: field norms.........OK [0 fields] [took 0.000 sec]
   test: terms, freq, prox...    test: stored fields.......OK [200000 total 
field count; avg 1.0 fields per doc] [took 0.055 sec]
   test: term vectors........OK [0 total term vector count; avg 0.0 term/freq 
vector fields per doc] [took 0.000 sec]
   test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 
SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET; 0 SKIPPING INDEX] [took 0.000 sec]
   test: points..............OK [0 fields, 0 points] [took 0.000 sec]
   test: vectors.............OK [1 fields, 200000 vectors] [took 0.111 sec]
   test: hnsw graphs.........OK [1 fields] [took 0.317 sec]
     hnsw field name: knn
       level 3: 11 nodes, 11/11 connected
       level 2: 141 nodes, 141/141 connected
       level 1: 6022 nodes, 6022/6022 connected
       level 0: 200000 nodes, 199999/200000 connected
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to