jpountz commented on issue #14758: URL: https://github.com/apache/lucene/issues/14758#issuecomment-2944618497
If filtered performance is critical, then I wonder if there could be better ways, e.g. using a vector search algorithm from the IVF family instead of HNSW, and configuring an index sort on the same fields that you plan on using as an ID in your proposal so that filtering would boil down to intersecting doc IDs with a range. Or creating multiple indexes, e.g. an e-commerce catalog could have one Lucene index for its most popular category and another Lucene index for all other categories - when there is no filter you would search these two indexes with a `MultiReader`, and otherwise it would select which index to search depending on the category filter. Or a mix of these two ideas. My concern with your proposal is that it would be quite invasive in terms of API (IDs would need to be configurable in the `IndexableField` API, exposed by `VectorValues`, etc.) and hard to maintain (every future `KNNVectorsFormat` would have to be able to deal with this `ID` concept). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org