zacharymorn commented on issue #11428: URL: https://github.com/apache/lucene/issues/11428#issuecomment-1403063251
Hi @jpountz @dnhatn, I have been looking at this issue and studying the code for soft delete, and have some general understanding of the complexity of this issue. If I understand it correctly, here are the primary major work that needs to be done if we were to migrate soft delete handling to live docs: 1. Define a new format for liv to encode soft delete (and the naive way to do this is probably to have two bits per doc to indicate hard and soft deletes?). 2. Update `SoftDeletesRetentionMergePolicy` to apply / combine soft and hard delete bitsets, so that the upper layers do not get impacted by liv format change. 3. Update soft delete APIs to remove the need to specify a dedicated dv field. 4. Update soft delete internal logic to potentially build and maintain a separate bitset from hard delete, and write them both to disc. However, I do have a few questions but couldn't seems to find answers in the code / comment / original task tickets. I'm wondering if you may have some insights in these? 1. Was there any historical context / reason of not encoding soft delete into liv doc in the first place? From https://issues.apache.org/jira/browse/LUCENE-8233, it looks like the API design were proposed and accepted to give users some flexibility? 2. Given now applications can technically write any numeric value into the soft delete field, and may somehow use that value as part of the soft delete retention query, migrating the encoding into a potentially binary liv format may not support that use case any more? Is that something we could deprecate directly? Here's an example of such a retention query: ``` MergePolicy policy = new SoftDeletesRetentionMergePolicy( "soft_delete", () -> new DocValuesNumbersQuery("soft_delete", Collections.list(2L, 3L, 100L)), new LogDocMergePolicy()); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org