noahprince22 opened a new issue #6187:
URL: https://github.com/apache/incubator-pinot/issues/6187


   https://apache-pinot.slack.com/archives/CDRCA57FC/p1603720037246100
   
   Some discussion already here.
   
   This would involve modifying the pinot server to include a lazy mode that 
would set it to lazily pull segments as they are requested using an LRU cache. 
It should just take some modification to the SegmentDataManager and maybe the 
table manager.
   
   This would allow using s3 as the primary storage, with pinot as the 
query/caching layer for long term historical tiers of data. Similar to the 
tiering example, you’d have a third set of lazy servers for reading data older 
than 2 weeks. This is explicitly to avoid large EBS volume costs for very large 
data sets.
   
   My main concern is this — a moderately sized dataset for us is 130GB a day. 
We have some that can be in the terra range per day. Using 500MB segments, 
you’re looking at ~260 segments a day. Maybe ~80k segments a year. In this 
case, broker pruning is very important because any segment query sent to the 
lazy server means materializing data from s3. This data is mainly time series, 
which means segments would be in time-bound chunks. Does Pinot broker prune 
segments by time? How is the broker managing segments? Does it just have an 
in-memory list of all segments for all tables? If so, metadata pruning will 
become a bottleneck for us on most queries. I’d like to see query time scale 
logarithmically with the size of the data.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to