benwtrent commented on issue #11963:
URL: https://github.com/apache/lucene/issues/11963#issuecomment-1351900658

   OK, step one of this is done with the `KnnByteVectorQuery`. I am next 
approaching a new `KnnByteVectorField` and this will cause some refactoring to 
`LeafReader` or `VectorValues`.
   
   Right now, `VectorValues` assumes all vector values are `float[]` and 
"expands" the bytes, needlessly.
   
   There are many ways to approach this and I am honestly not sure the way to 
go here (being unfamiliar with typical API patterns).
   
   It seems there are a handful of seemingly valid options:
   
    * 1. Add a method to `VectorValues` (which is returned via `LeafReader`) 
that explicitly returns `byteVectorValue()` and will throw if `vectorValue()` 
is called when the encoding is `BYTE` (and throw on `byteVectorValue()` if 
encoding is `FLOAT32`).
       * This doesn't seem like the best to me. Though we already require users 
to choose the correct methods based on the vector encoding.
    * 2. Add a new `AbstractVectorValues<T>` where `vectorValue()` returns `T`. 
`LeafReader` would be changed to return that instead of `VectorValues`. 
       * There don't seem to be precedent on using generics like this anywhere, 
and this breaks with how we handle DocValues. The main difference with vectors 
and doc values is that numeric doc values are all returned as Long, but really 
could be any number whose bytes FIT in a long...
    * 3. Add a new `LeafReader#getByteVectorValues()` that returns a 
`ByteVectorValues` object. The main downside here is that we are not making it 
flexible for adding new vector kinds in the future. We will want to add boolean 
vectors & hamming distance in the future. These are important for image search.
    * 4. Simply make `VectorValues` always return `BytesRef` for everything and 
provide similarity that knows the encoding and can decode the `float[]` if that 
is the encoding. This feels closer to what we do with doc values (taking 
`DoubleDocValuesField` & `DoubleValuesSource` as prior art with how that 
interaction is done with `NumericDocValues`).
   
   
   @jpountz @rmuir do y'all have opinions here? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to