Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

via GitHub Mon, 06 Nov 2023 11:06:01 -0800


benwtrent commented on PR #12729:
URL: https://github.com/apache/lucene/pull/12729#issuecomment-1795961973


   @jimczi thinking about it more, it seems to be a Flat* index format for 
vector search will require a different API. Right now, kNN search assumes the 
user provides pre-filters. However, when doing a flat vector search, there is 
no such thing as a "pre-filter". 
   
   In fact, a flat vector search should behave like a typical scorer, iterating 
over matching documents and scoring them.
   
   The question then becomes, could we just use the 
`LeafReader#getFloatVectorValues`?
   
   We might be able to, but this could be a bigger change and might not buy a 
big improvement. It seems counter intuitive to de-quantize every vector back 
into a `float[]` instead of just quantizing the query vector into `byte[]`.
   
   
   It almost seems like these flat formats should return a `RandomVectorScorer` 
instead of satisfying the typical `searchNearestVectors` interface.
   
   
   Since this is a larger API discussion, do we think we can move forward with 
the way it is now (quantization for HNSW and other vector indices) and iterate 
on exposing flat vectors in a separate line of work?
   
   @jimczi @jpountz if y'all have ideas around how the API could look, please 
let me know!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

Reply via email to