[GitHub] [lucene] jtibshirani commented on pull request #239: LUCENE-10040: Handle deletions in nearest vector search

GitBox Mon, 31 Oct 2022 14:32:38 -0700


jtibshirani commented on PR #239:
URL: https://github.com/apache/lucene/pull/239#issuecomment-1297711772


   @harishankar-gopalan sorry for the slow response! Your overall understanding 
is right. In Lucene, deletions are handled by marking a document as deleted 
using a 'tombstone'. The index structures are not actually updated (this 
includes the HNSW graph). 
   
   In response to your questions...
   1. Yes, when there are deletions, we make sure to expand the search space to 
retrieve top `k`. We use a similar strategy that many vector search engines use 
for kNN with filtering. During the HNSW search, we make sure to exclude deleted 
nodes from the final candidate set, but deleted nodes are still used for 
traversing hte graph.
   2. No, there is no work in that direction. This is because Lucene segments 
are never updated after they are first written (except under rare 
circumstances). Lucene's immutable segment model is core to its design, and 
it's the reason we use 'tombstones' instead of modifying the graph in place.
   
   For segment merges, we combine the vectors across all segments and build a 
new graph from scratch. We make sure to skip over deleted documents during this 
merge, so they have no effect on the time it takes to build the graph. This 
merge process is quite expensive and we're brainstorming ways of making it 
faster (https://github.com/apache/lucene/issues/11354).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on pull request #239: LUCENE-10040: Handle deletions in nearest vector search

Reply via email to