benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1795961973
@jimczi thinking about it more, it seems to be a Flat* index format for vector search will require a different API. Right now, kNN search assumes the user provides pre-filters. However, when doing a flat vector search, there is no such thing as a "pre-filter". In fact, a flat vector search should behave like a typical scorer, iterating over matching documents and scoring them. The question then becomes, could we just use the `LeafReader#getFloatVectorValues`? We might be able to, but this could be a bigger change and might not buy a big improvement. It seems counter intuitive to de-quantize every vector back into a `float[]` instead of just quantizing the query vector into `byte[]`. It almost seems like these flat formats should return a `RandomVectorScorer` instead of satisfying the typical `searchNearestVectors` interface. Since this is a larger API discussion, do we think we can move forward with the way it is now (quantization for HNSW and other vector indices) and iterate on exposing flat vectors in a separate line of work? @jimczi @jpountz if y'all have ideas around how the API could look, please let me know! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org