Re: [I] What does the Lucene community think about dimensionality reduction for vectors, and should it be something the library does internally (at merge time perhaps)? [lucene]

via GitHub Mon, 03 Jun 2024 04:22:21 -0700


mikemccand commented on issue #13403:
URL: https://github.com/apache/lucene/issues/13403#issuecomment-2144941653


   > As an aside, this "wait to build the index" thing could also be done for 
HNSW. Tiny segments with quick flushes probably shouldn't even build HNSW 
graphs. Instead, they should probably store the float vectors flat (or the 
scalar quantized vectors flat as scalar quantizing is effectively linear in 
runtime). Then when a threshold is reached (it could be small, something like 
1k, 10k?), we create the HNSW graphs.
   
   Oooh -- +1 to explore this idea as a pre-cursor separately from 
enabling/exploring dimensionality reduction compression.  Lucene's write-once 
segments really make such optimizations (different choices depending on 
segment's size or characteristics of the documents in each segment) possible 
and worthwhile!  Maybe open a spinoff for this one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] What does the Lucene community think about dimensionality reduction for vectors, and should it be something the library does internally (at merge time perhaps)? [lucene]

Reply via email to