mikemccand commented on issue #15509:
URL: https://github.com/apache/lucene/issues/15509#issuecomment-3666422833

   Oooh +1!  `CheckIndex` is already walking the HNSW graph already to check 
integrity or so (no duplicated transitions at least)?  So maybe getting these 
stats and printing them when users asks for `-verbose` is simple?  Those stats 
can be helpful.  For example you could compare two graphs that are supposed to 
be similar (say indexed from same set of documents, but maybe in different 
order or so) and gauge the aggregate statistics (histograms showing how bushy 
the nodes generally are).
   
   We tell HNSW construction it can add up to `maxConn` edges to each node, but 
it often/typically uses fewer.  Here's a recent output from one of my 
`knnPerfTest.py` runs:
   
   ```
   Graph level=2 size=51, Fanout min=4, mean=9.33, max=14, meandelta=25557.37
   %   0  10  20  30  40  50  60  70  80  90 100
       0   5   7   7   9   9  10  10  12  13  14
   Graph level=1 size=5870, Fanout min=11, mean=40.85, max=64, meandelta=9241.01
   %   0  10  20  30  40  50  60  70  80  90 100
       0  24  28  31  35  39  43  49  56  64  64
   Graph level=0 size=400000, Fanout min=1, mean=63.43, max=128, 
meandelta=6004.07
   %   0  10  20  30  40  50  60  70  80  90 100
       0  31  38  44  49  56  64  74  89 115 128
   Graph level=2 size=51, connectedness=1.00
   Graph level=1 size=5870, connectedness=1.00
   Graph level=0 size=400000, connectedness=1.00
   ```
   
   So P50 at level=0 (all vectors) is 56 connected nodes.
   
   Hmm, why is P100 128?  I had run this with `maxConn=64`.  Are we somehow 
doubling this somewhere?  Maybe `knnPerfTest` is doing something fishy?
   
   See!  This is why such transparency is so helpful :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to