tibrewalpratik17 opened a new issue, #14305: URL: https://github.com/apache/pinot/issues/14305
The concept of compaction traditionally refers to the process of making something denser or more tightly packed. In its current implementation, the Upsert-Compaction task in Apache Pinot operates at the segment level, where it rebuilds individual segments by removing unused or invalid rows. This approach has proven highly effective in controlling the disk usage of upsert tables. However this task focuses on addressing the issue of the continuously growing number of segments in upsert tables. To mitigate this challenge, we propose a multi-segment compaction model for upsert tables. In this model, multiple segments will be combined and re-uploaded as a single, consolidated segment, with invalid or unused rows removed. This approach aims to reduce the overall segment count while maintaining the storage efficiency benefits of the current upsert-compaction mechanism. Sharing the [design doc](https://docs.google.com/document/d/1uzFJggSAxxVpnro5yr-HnWQ-8j5G3EggdSh5bjG78kI/edit?pli=1&tab=t.0) here for review and feedback from the community. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org