benwtrent commented on PR #12582: URL: https://github.com/apache/lucene/pull/12582#issuecomment-1731463225
> Do we know why search is faster? Is it mostly because working on the quantized vectors requires a lower memory bandwi[d]th? Search is faster in two regards: - PanamaVector allows for more `byte` actions to occur at once than `float32` (should be major) - Reading `byte[]` off of a buffer doesn't require decoding floats (very minor change) IMO, we should be seeing WAY better search numbers. I need to do more testing to triple check. > Do you know how recall degrades compared to without quantization? I saw the numbers you shared but I don't have a good sense of what recall we usually had until now. ++ I want to graph the two together to compare so its clearer. > I don't feel great about the logic that merges quantiles at merge time and only requantizes if the merged quantiles don't differ too much from the input quantiles. It feels like quantiles could slowly change over multiple merging rounds and we'd end up in a state where the quantized vectors would be different from requantizing the raw vectors with the quantization state that is stored in the segment, which feels wrong. Am I missing something? The quantization buckets could change slightly overtime, but since we are bucketing `float32` into `int8`, the error bounds are comparatively large. The cost of requantization is almost never worth it. In my testing, quantiles over random data from the same data set shows that segments differ by only around `1e-4`, which is tiny and shouldn't require requantization. @tveasey helped me do some empirical analysis here and can provide some numbers. > Related to the above, it looks like we ignore deletions when merging quantiles. It would probably be ok in practice most of the time but I worry that there might be corner cases? A corner case in what way? That we potentially include deletions when computing quantiles or if re-quantization is required? We can easily exclude them as conceptually, the "new" doc (if it were an update) would exist in another segment. It could be we are double counting a vector and we probably shouldn't do that. > > Do we want to have a new "flat" vector codec that HNSW (or other complicated vector indexing methods), can use? Detractor here is that now HNSW codec relies on another pluggable thing that is a "flat" vector index (just provides mechanisms for reading, writing, merging vectors in a flat index). > I don't have a strong opinion on this. Making it a codec though has the downside that it would require more files since two codecs can't write to the same file. Maybe having utility methods around reading/writing flat vectors is good enough? Utility methods are honestly what I am leaning towards. Its then a discussion around how a codec (like HNSW) is configured to use it. > > Should "quantization" just be a thing that is provided to vector codecs? > I might be misunderstanding the question, but to me this is what the byte[] encoding is about. And this quantization that's getting added here is more powerful because it's adaptative and will change over time depending on what vectors get indexed or deleted? If it needs to adapt to the data then it belongs to the codec. We could have utility code to make it easier to write codecs that quantize their data though (maybe this is what your question suggested?). Yeah, it needs to adapt over time. There are adverse cases (indexing vectors sorted by relative clusters is one) that need to be handled. But, they can be handled easily at merge time by recomputing quantiles and potentially re-quantizing. > > Should the "quantizer" keep the raw vectors around itself? > My understanding is that we have to, as the accuracy of the quantization could otherwise degrade over time in an unbounded fashion. After a period of time, if vectors are part of the same corpus and created via the same model, the quantiles actually level out and re-quantizing will rarely or never occur since the calculated quantiles are statistically equivalent. Especially given the binning into `int8`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org