benwtrent commented on PR #12582:
URL: https://github.com/apache/lucene/pull/12582#issuecomment-1731463225

   > Do we know why search is faster? Is it mostly because working on the 
quantized vectors requires a lower memory bandwi[d]th?
   
   Search is faster in two regards:
   
    - PanamaVector allows for more `byte` actions to occur at once than 
`float32` (should be major)
    - Reading `byte[]` off of a buffer doesn't require decoding floats (very 
minor change)
   
   IMO, we should be seeing WAY better search numbers. I need to do more 
testing to triple check.
   
   > Do you know how recall degrades compared to without quantization? I saw 
the numbers you shared but I don't have a good sense of what recall we usually 
had until now.
   
   ++ I want to graph the two together to compare so its clearer.
   
   
   
   > I don't feel great about the logic that merges quantiles at merge time and 
only requantizes if the merged quantiles don't differ too much from the input 
quantiles. It feels like quantiles could slowly change over multiple merging 
rounds and we'd end up in a state where the quantized vectors would be 
different from requantizing the raw vectors with the quantization state that is 
stored in the segment, which feels wrong. Am I missing something?
   
   The quantization buckets could change slightly overtime, but since we are 
bucketing `float32` into `int8`, the error bounds are comparatively large. 
   
   The cost of requantization is almost never worth it. In my testing, 
quantiles over random data from the same data set shows that segments differ by 
only around `1e-4`, which is tiny and shouldn't require requantization.
   
   @tveasey helped me do some empirical analysis here and can provide some 
numbers.
   
   
   > Related to the above, it looks like we ignore deletions when merging 
quantiles. It would probably be ok in practice most of the time but I worry 
that there might be corner cases?
   
   A corner case in what way? That we potentially include deletions when 
computing quantiles or if re-quantization is required?
   
   We can easily exclude them as conceptually, the "new" doc (if it were an 
update) would exist in another segment. It could be we are double counting a 
vector and we probably shouldn't do that.
   
   > > Do we want to have a new "flat" vector codec that HNSW (or other 
complicated vector indexing methods), can use? Detractor here is that now HNSW 
codec relies on another pluggable thing that is a "flat" vector index (just 
provides mechanisms for reading, writing, merging vectors in a flat index).
   
   > I don't have a strong opinion on this. Making it a codec though has the 
downside that it would require more files since two codecs can't write to the 
same file. Maybe having utility methods around reading/writing flat vectors is 
good enough?
   
   Utility methods are honestly what I am leaning towards. Its then a 
discussion around how a codec (like HNSW) is configured to use it.
   
   > > Should "quantization" just be a thing that is provided to vector codecs?
   
   > I might be misunderstanding the question, but to me this is what the 
byte[] encoding is about. And this quantization that's getting added here is 
more powerful because it's adaptative and will change over time depending on 
what vectors get indexed or deleted? If it needs to adapt to the data then it 
belongs to the codec. We could have utility code to make it easier to write 
codecs that quantize their data though (maybe this is what your question 
suggested?).
   
   Yeah, it needs to adapt over time. There are adverse cases (indexing vectors 
sorted by relative clusters is one) that need to be handled. But, they can be 
handled easily at merge time by recomputing quantiles and potentially 
re-quantizing.
   
   > > Should the "quantizer" keep the raw vectors around itself?
   
   > My understanding is that we have to, as the accuracy of the quantization 
could otherwise degrade over time in an unbounded fashion.
   
   After a period of time, if vectors are part of the same corpus and created 
via the same model, the quantiles actually level out and re-quantizing will 
rarely or never occur since the calculated quantiles are statistically 
equivalent. Especially given the binning into `int8`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to