atris commented on PR #10702: URL: https://github.com/apache/pinot/pull/10702#issuecomment-1528969242
> This won't work. We need to keep 2 bitmaps, one for valid docs and one for queryable docs. If we only keep valid docs, and count delete as removing it from the valid doc, the next event won't work as expected because we lost track of the delete event. > > This is already a design (#10452) that covers this, so let's not duplicate the work. > > @navina will post the implementation soon @Jackie-Jiang and myself discussed offline, and we have agreed to iterate on this PR -- and look on solving the snapshot of validDocID map issues. The main issue that I wish to avoid is the separate map for queryable docIDs. The three issues we face today with the tombstone approach are: 1. Losing the deleted docID when the consuming segment seals. 2. The delete tombstone record not loading when we snapshot validDocIDs. 3. Losing the deleted timestamp and hence not able to handle out of order events. This PR solves 3. by maintaining the full record of deletion in the segment and the primary key index -- as is demonstrated in the added unit tests. For 1, it does not matter as long as we are able to rebuild the primary key index with the deleted primary key records present. The main challenge is 2, and we need to think about it a bit more. Will keep the group posted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org