jpountz commented on issue #14758:
URL: https://github.com/apache/lucene/issues/14758#issuecomment-2944618497

   If filtered performance is critical, then I wonder if there could be better 
ways, e.g. using a vector search algorithm from the IVF family instead of HNSW, 
and configuring an index sort on the same fields that you plan on using as an 
ID in your proposal so that filtering would boil down to intersecting doc IDs 
with a range. Or creating multiple indexes, e.g. an e-commerce catalog could 
have one Lucene index for its most popular category and another Lucene index 
for all other categories - when there is no filter you would search these two 
indexes with a `MultiReader`, and otherwise it would select which index to 
search depending on the category filter. Or a mix of these two ideas.
   
   My concern with your proposal is that it would be quite invasive in terms of 
API (IDs would need to be configurable in the `IndexableField` API, exposed by 
`VectorValues`, etc.) and hard to maintain (every future `KNNVectorsFormat` 
would have to be able to deal with this `ID` concept).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to