snleee commented on issue #6189: URL: https://github.com/apache/incubator-pinot/issues/6189#issuecomment-724387596
@noahprince22 If we keep the start timestamp only, we cannot effectively prune segments because we don't know the upper bound. Keeping start time may help for your use case but it's not a generic solution. Simple math: Let's assume that we roughly store 100bytes for each segment ( we need to store segment name, start & end timestamps, and some other info). ``` 100 bytes /segment * 20 million segments = ~2GB ``` It indeed requires GBs of memory; however, having 20millions of segments for Pinot cluster is a bit extreme use cases. If you set your segment size to be a reasonable size (200-300MB per segment), you won't have 20million segments. (200MB * 20 million segments = 4PB). To support this many segments, we probably need to read the metadata from disk instead of keeping everything in memory. IMO, we can first start with what @jtao15 suggested and see how they perform on your use case. How do you think? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org