msokolov opened a new pull request, #13779: URL: https://github.com/apache/lucene/pull/13779
addresses https://github.com/apache/lucene/issues/13778 Key things in this PR: 1. Introduces abstract `KnnVectorValues` from which `ByteVectorValues` and `FloatVectorValues` derive; 2. Folds `RandomAccessVectorValues` into `KnnVectorValues` thus eliminating some casts. 3. `RandomAccessVectorValues.Floats` becomes `FloatVectorValues` and `RandomAccessVectorValues.Bytes` becomes `ByteVectorValues`. `RandomAccessQuantizedByteVectorValues` folded into `QuantizedByteVectorValues`. 4. `IndexInput getSlice()` moved to a new `HasIndexSlice` interface. 5. Introduces `VectorEncoding KnnVectorValues.getEncoding()` to enable type-specific branches in a few places where we are dealing with abstract `KnnVectorValues` (tests only IIRC). Also used that to provide a default `getVectorByteLength()`. 6. `KnnVectorValues` no longer extends `DocIdSetIterator`; rather it provides a tightly-coupled `iterator()`. This enables refactoring common iteration patterns that were repeated many times in the code base. This new iterator, `DocIndexIterator` provides an additional method `index()` analogous to `IndexedDISI`. Some of the methods on `KnnVectorValues` have default impls that throw `UnsupportedOperationException` enabling subclasses to provide partial implementations and relying on testing to catch missing required methods. I'd like feedback on this. Should we provide implementations we never use, just to make these classes complete? That didn't make sense to me. But the previous alternative of attempting to provide strict adherence to declarative contracts was becoming in my view, overly restrictive and leading to hard-to-maintain code. Some of these readers would only ever be used iteratively. Random access is required for search, but not used when merging the values themselves, and when we merge we do search, but using a temporary file so that searching is always done over a file-based value. Random access also gets used during merging when the index is sorted, again this is provided by specialized readers, so not every reader needs to implement random access. But the API maintenance is greatly simplified if we allow partial implementation. Anyway that is the idea I am trying out here. Can we live with a little less API purity and gain some simplicity and less boilerplate? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org