zacharymorn commented on issue #11428:
URL: https://github.com/apache/lucene/issues/11428#issuecomment-1403063251

   Hi @jpountz @dnhatn, I have been looking at this issue and studying the code 
for soft delete, and have some general understanding of the complexity of this 
issue. If I understand it correctly, here are the primary major work that needs 
to be done if we were to migrate soft delete handling to live docs:
   1. Define a new format for liv to encode soft delete (and the naive way to 
do this is probably to have two bits per doc to indicate hard and soft 
deletes?).
   2. Update `SoftDeletesRetentionMergePolicy` to apply / combine soft and hard 
delete bitsets, so that the upper layers do not get impacted by liv format 
change.
   3. Update soft delete APIs to remove the need to specify a dedicated dv 
field. 
   4. Update soft delete internal logic to potentially build and maintain a 
separate bitset from hard delete, and write them both to disc. 
   
   However, I do have a few questions but couldn't seems to find answers in the 
code / comment / original task tickets. I'm wondering if you may have some 
insights in these? 
   1. Was there any historical context / reason of not encoding soft delete 
into liv doc in the first place?  From 
https://issues.apache.org/jira/browse/LUCENE-8233, it looks like the API design 
were proposed and accepted to give users some flexibility?
   2. Given now applications can technically write any numeric value into the 
soft delete field, and may somehow use that value as part of the soft delete 
retention query, migrating the encoding into a potentially binary liv format 
may not support that use case any more? Is that something we could deprecate 
directly? Here's an example of such a retention query:
   
   ```
   MergePolicy policy =
           new SoftDeletesRetentionMergePolicy(
               "soft_delete", () -> new DocValuesNumbersQuery("soft_delete", 
Collections.list(2L, 3L, 100L)), new LogDocMergePolicy());
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to