kaivalnp commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2628899940
I found one way to reduce index-time RAM usage -- turns out the [`FlatVectorsWriter`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/hnsw/FlatVectorsWriter.java) maintains a [list](https://github.com/apache/lucene/blob/faec0f823817ca95f1f103d6b9482d26ee75cc7b/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsWriter.java#L407) of all vectors on heap before [writing them to disk](https://github.com/apache/lucene/blob/faec0f823817ca95f1f103d6b9482d26ee75cc7b/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsWriter.java#L170-L177) on flush, and we need not use a [`BufferingKnnVectorsWriter`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/BufferingKnnVectorsWriter.java) for it In the flat format, the peak RAM usage is \~2x of vector size (once on heap, and once while allocating to a buffer just before writing to disk) and constant heap usage is \~1x of vector size (all on heap) In the Faiss format, we can read these vectors, copy them over to the native process, and start indexing them. The peak RAM usage is \~3x of vector size here (once on heap -- reusing the flat vectors, once as a copy in the native process, and once inside the native index). Previously, we were maintaining two copies of the vectors on heap (once in the flat format, and once in the buffering writer) We could reduce the peak RAM usage by indexing vectors in batches (and limit the size of the native copy required) -- but this would hurt indexing performance At merge time, we're now reading the disk-backed vectors from the flat format and directly adding them to the native process (so no copy on heap required) --- Single segment, no merges Lucene: ``` recall latency (ms) nDoc topK fanout maxConn beamWidth quantized index s index docs/s force merge s num segments index size (MB) vec disk (MB) vec RAM (MB) 0.812 1.389 200000 100 50 32 200 no 146.49 1365.31 0.01 1 236.93 228.882 228.882 ``` Faiss: ``` recall latency (ms) nDoc topK fanout maxConn beamWidth quantized index s index docs/s force merge s num segments index size (MB) vec disk (MB) vec RAM (MB) 0.811 1.103 200000 100 50 32 200 no 148.38 1347.90 0.01 1 511.97 228.882 228.882 ``` --- Single segment, with merges Lucene: ``` recall latency (ms) nDoc topK fanout maxConn beamWidth quantized index s index docs/s force merge s num segments index size (MB) vec disk (MB) vec RAM (MB) 0.809 1.366 200000 100 50 32 200 no 103.58 1930.95 116.78 1 236.92 228.882 228.882 ``` Faiss: ``` recall latency (ms) nDoc topK fanout maxConn beamWidth quantized index s index docs/s force merge s num segments index size (MB) vec disk (MB) vec RAM (MB) 0.811 1.104 200000 100 50 32 200 no 114.90 1740.64 145.93 1 511.97 228.882 228.882 ``` Merges are probably slower because we start fresh instead of [adding to existing indexes](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/IncrementalHnswGraphMerger.java) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org