[jira] [Commented] (LUCENE-10577) Quantize vector values

Robert Muir (Jira) Wed, 13 Jul 2022 01:50:06 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566223#comment-17566223
 ]


Robert Muir commented on LUCENE-10577:
--------------------------------------

By the way, if the right answer is that really different "widths" should be all 
supported due to different user needs (e.g. 1-byte, 2-byte, existing 4-byte), 
then perhaps FieldInfo instead is the right place to hold this, with *Field 
impls supporting 'byte' / 'short' / 'float' values, respectively. It would 
require codecs to support the three different types, but it wouldn't have any 
trappy lossiness and would be straightforward.

I still think the 2-byte case is interesting on newer hardware, with support 
such as https://bugs.openjdk.org/browse/JDK-8214751 already in openjdk. Too bad 
for this issue that the 1-byte case using {{VPDPBUSD}} is still TODO :)

But it seems really wrong to plumb it via VectorSimilarity, with the user still 
supplying float values. It still "requires" the codec to support the additional 
width but in a very nonstraightforward way. Seems to be the worst of both 
worlds.

> Quantize vector values
> ----------------------
>
>                 Key: LUCENE-10577
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10577
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Michael Sokolov
>            Priority: Major
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The {{KnnVectorField}} api handles vectors with 4-byte floating point values. 
> These fields can be used (via {{KnnVectorsReader}}) in two main ways:
> 1. The {{VectorValues}} iterator enables retrieving values
> 2. Approximate nearest -neighbor search
> The main point of this addition was to provide the search capability, and to 
> support that it is not really necessary to store vectors in full precision. 
> Perhaps users may also be willing to retrieve values in lower precision for 
> whatever purpose those serve, if they are able to store more samples. We know 
> that 8 bits is enough to provide a very near approximation to the same 
> recall/performance tradeoff that is achieved with the full-precision vectors. 
> I'd like to explore how we could enable 4:1 compression of these fields by 
> reducing their precision.
> A few ways I can imagine this would be done:
> 1. Provide a parallel byte-oriented API. This would allow users to provide 
> their data in reduced-precision format and give control over the quantization 
> to them. It would have a major impact on the Lucene API surface though, 
> essentially requiring us to duplicate all of the vector APIs.
> 2. Automatically quantize the stored vector data when we can. This would 
> require no or perhaps very limited change to the existing API to enable the 
> feature.
> I've been exploring (2), and what I find is that we can achieve very good 
> recall results using dot-product similarity scoring by simple linear scaling 
> + quantization of the vector values, so long as  we choose the scale that 
> minimizes the quantization error. Dot-product is amenable to this treatment 
> since vectors are required to be unit-length when used with that similarity 
> function. 
>  Even still there is variability in the ideal scale over different data sets. 
> A good choice seems to be max(abs(min-value), abs(max-value)), but of course 
> this assumes that the data set doesn't have a few outlier data points. A 
> theoretical range can be obtained by 1/sqrt(dimension), but this is only 
> useful when the samples are normally distributed. We could in theory 
> determine the ideal scale when flushing a segment and manage this 
> quantization per-segment, but then numerical error could creep in when 
> merging.
> I'll post a patch/PR with an experimental setup I've been using for 
> evaluation purposes. It is pretty self-contained and simple, but has some 
> drawbacks that need to be addressed:
> 1. No automated mechanism for determining quantization scale (it's a constant 
> that I have been playing with)
> 2. Converts from byte/float when computing dot-product instead of directly 
> computing on byte values
> I'd like to get people's feedback on the approach and whether in general we 
> should think about doing this compression under the hood, or expose a 
> byte-oriented API. Whatever we do I think a 4:1 compression ratio is pretty 
> compelling and we should pursue something.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10577) Quantize vector values

Reply via email to