jtibshirani commented on PR #239: URL: https://github.com/apache/lucene/pull/239#issuecomment-1297711772
@harishankar-gopalan sorry for the slow response! Your overall understanding is right. In Lucene, deletions are handled by marking a document as deleted using a 'tombstone'. The index structures are not actually updated (this includes the HNSW graph). In response to your questions... 1. Yes, when there are deletions, we make sure to expand the search space to retrieve top `k`. We use a similar strategy that many vector search engines use for kNN with filtering. During the HNSW search, we make sure to exclude deleted nodes from the final candidate set, but deleted nodes are still used for traversing hte graph. 2. No, there is no work in that direction. This is because Lucene segments are never updated after they are first written (except under rare circumstances). Lucene's immutable segment model is core to its design, and it's the reason we use 'tombstones' instead of modifying the graph in place. For segment merges, we combine the vectors across all segments and build a new graph from scratch. We make sure to skip over deleted documents during this merge, so they have no effect on the time it takes to build the graph. This merge process is quite expensive and we're brainstorming ways of making it faster (https://github.com/apache/lucene/issues/11354). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org