atris commented on PR #10702:
URL: https://github.com/apache/pinot/pull/10702#issuecomment-1528969242

   > This won't work. We need to keep 2 bitmaps, one for valid docs and one for 
queryable docs. If we only keep valid docs, and count delete as removing it 
from the valid doc, the next event won't work as expected because we lost track 
of the delete event.
   > 
   > This is already a design (#10452) that covers this, so let's not duplicate 
the work.
   > 
   > @navina will post the implementation soon
   
   @Jackie-Jiang and myself discussed offline, and we have agreed to iterate on 
this PR -- and look on solving the snapshot of validDocID map issues.
   
   The main issue that I wish to avoid is the separate map for queryable 
docIDs. The three issues we face today with the tombstone approach are: 
   1. Losing the deleted docID when the consuming segment seals.
   2. The delete tombstone record not loading when we snapshot validDocIDs.
   3. Losing the deleted timestamp and hence not able to handle out of order 
events.
   
   This PR solves 3. by maintaining the full record of deletion in the segment 
and the primary key index -- as is demonstrated in the added unit tests. For 1, 
it does not matter as long as we are able to rebuild the primary key index with 
the deleted primary key records present.
   
   The main challenge is 2, and we need to think about it a bit more. Will keep 
the group posted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to