pedro93 opened a new issue #6912: URL: https://github.com/apache/incubator-pinot/issues/6912
Hello, This issue serves to request support for segment compaction on real-time upsert-enabled tables which currently does not exist as mentioned in a [slack thread](https://apache-pinot.slack.com/archives/CDRCA57FC/p1620826182368300). This means that segments with old & stale entries are keep in disk and only deleted when the retention policy for segments is activated. Giving a concrete example why this is useful: - Suppose you have have a stream of events related to user activity (updated profile, saw an article, updated preferences, etc...) - Defined a real-time table in pinot where the primary key is the userId. Segment size is 500k and the stream is partitioned. - The set of users is roughly fixed (~50M). - You want to keep segments for a largeish time period (> 2 years). - Each day ~20% (10M) of the users generate some event which is consumed by Pinot. This will generate ~20 segments per day, over the course of 2 years we will have 14600 segments when in reality we need only 100 segments (the most up-to-date information for each user). If the example or issue is not clear feel free to reach out. Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org