abernardi597 commented on PR #15472:
URL: https://github.com/apache/lucene/pull/15472#issuecomment-3647265760

   > Lucene's HNSW merging has exactly this optimization
   I've been working on some modifications to further align the two 
implementations.
   For example, I have added changes to do single-threaded graph construction 
on the indexing thread (instead of buffering all the docs until building the 
graph in parallel at flush-time).
   
   I am working on this graph-reuse bit, though it looks like Lucene also does 
a smart merge where it inserts key nodes from the smaller graph such that it 
can re-use adjacency information from the small graph to seed the graph search 
when inserting the remaining nodes. JVector does not do this at the moment, but 
would likely benefit from such a change (possibly as an upstream contribution).
   
   > how does your PR here compare to that original PR?
   I looked at the original PR as a starting point, but found that there were 
several key changes in the upstream OpenSearch implementation that could be 
brought in. Merging those commits seemed unwieldy, so I opted to start from 
scratch by [checking out the 
codec](https://github.com/apache/lucene/pull/15472/commits/7843b82f6cdc1f3075a2ff5f489039f20c2c7c0a)
 into the `sandbox`. Then I fixed the build and style issues before making some 
changes to how the codec actually works to get more functional parity with 
Lucene's HNSW codecs. Specifically trying to get the extra KNN tests passing 
and moving towards the single-indexing-thread model as I mentioned above.
   
   > Does Lucene's HNSW have an analogue for neighborOverflow=2 and alpha=2
   We have found that `alpha=2` is actually partially responsible for the 
increase in index/search time. `alpha > 1` is a hyper-parameter that relaxes 
the diversity check by a multiplicative factor, with `alpha=1` being the same 
diversity check as HNSW. We found that `alpha=2` resulted in graphs with every 
node saturated with edges (`maxConn` edges), which was really slowing down the 
construction and search.
   
   There is also a `hierarchyEnabled` flag that adds layers to the graph in 
much the same fashion as the `H` in HNSW.
   Enabling the hierarchy with `alpha=1` and also allowing `2*maxConn` for 
`level=0` gives somewhat more promising results:
   
   recall | latency(ms) | netCPU | avgCpuCount | nDoc | topK | fanout | maxConn 
| beamWidth | quantized | visited | index(s) | index_doc/s | force_merge(s) | 
num_segments | index_size(MB) | vec_disk(MB) | vec_RAM(MB) | indexType | metric
   -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | 
-- | -- | -- | -- | --
   0.926 | 2.262 | 2.2 | 0.973 | 100000 | 100 | 50 | 64 | 250 | no | 3277 | 
12.07 | 8282.26 | 0.01 | 1 | 319.21 | 292.969 | 292.969 | JVECTOR | COSINE
   0.926 | 10.106 | 9.95 | 0.985 | 100000 | 100 | 50 | 64 | 250 | 8 bits | 3238 
| 196.74 | 508.27 | 0.01 | 1 | 393.46 | 367.737 | 74.768 | JVECTOR | COSINE
   0.926 | 3.444 | 3.386 | 0.983 | 100000 | 100 | 50 | 64 | 250 | 4 bits | 3189 
| 75.32 | 1327.6 | 0.01 | 1 | 356.71 | 331.116 | 38.147 | JVECTOR | COSINE
   0.739 | 1.15 | 1.122 | 0.976 | 100000 | 100 | 50 | 64 | 250 | 1 bits | 2581 
| 22.24 | 4496.4 | 0.01 | 1 | 329.15 | 303.459 | 10.49 | JVECTOR | COSINE
   
   Combining this with the single-thread-indexing mentioned above lets me run a 
more apples-to-apples test, with 32x indexing threads and 32x merge threads 
with a final force-merge for both codecs:
   
   recall | latency(ms) | netCPU | avgCpuCount | numDoc | topK | fanout | 
maxConn | beamWidth | quantized | visited | index(s) | index_doc/s | 
force_merge(s) | total_index(s) | num_segments | index_size(MB) | vec_disk(MB) 
| vec_RAM(MB) | indexType
   -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | 
-- | -- | -- | -- | --
   0.960 | 1.796 | 1.764 | 0.982 | 200000 | 100 | 50 | 64 | 250 | no | 5596 | 
12.81 | 15612.80 | 24.58 | 37.39 | 1 | 596.91 | 585.938 | 585.938 | HNSW
   0.904 | 2.416 | 2.371 | 0.981 | 200000 | 100 | 50 | 64 | 250 | no | 3321 | 
15.42 | 12972.69 | 73.05 | 88.47 | 1 | 686.89 | 585.938 | 585.938 | JVECTOR
   0.894 | 1.391 | 1.363 | 0.980 | 200000 | 100 | 50 | 64 | 250 | 4 bits | 5661 
| 18.55 | 10784.00 | 21.93 | 40.48 | 1 | 672.01 | 662.231 | 76.294 | HNSW
   0.903 | 3.923 | 3.862 | 0.984 | 200000 | 100 | 50 | 64 | 250 | 4 bits | 3274 
| 15.56 | 12850.99 | 107.83 | 123.39 | 1 | 760.87 | 662.231 | 76.294 | JVECTOR
   0.661 | 0.887 | 0.867 | 0.977 | 200000 | 100 | 50 | 64 | 250 | 1 bits | 6552 
| 17.25 | 11594.20 | 19.71 | 36.96 | 1 | 617.32 | 606.918 | 20.981 | HNSW
   0.724 | 1.252 | 1.229 | 0.982 | 200000 | 100 | 50 | 64 | 250 | 1 bits | 2704 
| 15.52 | 12888.26 | 35.89 | 51.41 | 1 | 705.95 | 606.918 | 20.981 | JVECTOR
   
   I'm nearly at a point where I can re-use the largest graph at merge-time, 
but I'm working through an elusive duplicate neighbor bug.
   
   > making it easy-ish for anyone to benchmark jVector against Faiss wrapped 
Codec and Lucene's HNSW implementation would be awesome
   
   Apologies on the delay here, I am working on re-applying my changes on top 
of these awesome improvements!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to