snleee commented on issue #6189:
URL: 
https://github.com/apache/incubator-pinot/issues/6189#issuecomment-724387596


   @noahprince22 If we keep the start timestamp only, we cannot effectively 
prune segments because we don't know the upper bound. Keeping start time may 
help for your use case but it's not a generic solution. 
   
   Simple math:
   
   Let's assume that we roughly store 100bytes for each segment ( we need to 
store segment name, start & end timestamps, and some other info).
   ```
   100 bytes /segment * 20 million segments =  ~2GB
   ```
   
   It indeed requires GBs of memory; however, having 20millions of segments for 
Pinot cluster is a bit extreme use cases. If you set your segment size to be a 
reasonable size (200-300MB per segment), you won't have 20million segments. 
(200MB * 20 million segments = 4PB). To support this many segments, we probably 
need to read the metadata from disk instead of keeping everything in memory.
   
   IMO, we can first start with what @jtao15 suggested and see how they perform 
on your use case. How do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to