benwtrent commented on issue #11963: URL: https://github.com/apache/lucene/issues/11963#issuecomment-1351900658
OK, step one of this is done with the `KnnByteVectorQuery`. I am next approaching a new `KnnByteVectorField` and this will cause some refactoring to `LeafReader` or `VectorValues`. Right now, `VectorValues` assumes all vector values are `float[]` and "expands" the bytes, needlessly. There are many ways to approach this and I am honestly not sure the way to go here (being unfamiliar with typical API patterns). It seems there are a handful of seemingly valid options: * 1. Add a method to `VectorValues` (which is returned via `LeafReader`) that explicitly returns `byteVectorValue()` and will throw if `vectorValue()` is called when the encoding is `BYTE` (and throw on `byteVectorValue()` if encoding is `FLOAT32`). * This doesn't seem like the best to me. Though we already require users to choose the correct methods based on the vector encoding. * 2. Add a new `AbstractVectorValues<T>` where `vectorValue()` returns `T`. `LeafReader` would be changed to return that instead of `VectorValues`. * There don't seem to be precedent on using generics like this anywhere, and this breaks with how we handle DocValues. The main difference with vectors and doc values is that numeric doc values are all returned as Long, but really could be any number whose bytes FIT in a long... * 3. Add a new `LeafReader#getByteVectorValues()` that returns a `ByteVectorValues` object. The main downside here is that we are not making it flexible for adding new vector kinds in the future. We will want to add boolean vectors & hamming distance in the future. These are important for image search. * 4. Simply make `VectorValues` always return `BytesRef` for everything and provide similarity that knows the encoding and can decode the `float[]` if that is the encoding. This feels closer to what we do with doc values (taking `DoubleDocValuesField` & `DoubleValuesSource` as prior art with how that interaction is done with `NumericDocValues`). @jpountz @rmuir do y'all have opinions here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org