suddendust commented on issue #7229:
URL: https://github.com/apache/pinot/issues/7229#issuecomment-892919348


   Thanks @mcvsubbu for the review and the pointers. As for the FSM, this is 
the new state I proposed:
   
   <img width="796" alt="Screenshot 2021-08-05 at 12 00 38 AM" 
src="https://user-images.githubusercontent.com/84911643/128235324-23760e70-861d-43ca-b6ab-a6e338ba20b6.png";>
   
   The reason I introduced a new state is because it looked like a cleaner way 
for the broker to determine that a segment has been purged and has been moved 
to the deep-store. Certainly, the broker can also determine this by first 
determining that the segment is absent, and then looking at its S3 location in 
the segment config. Just that the code will be a bit less clean in this case. 
With the new state, I was thinking of defining an invariant that a segment 
moves to this state _iff_ it was successfully uploaded AND its URL was 
successfully updated in its metadata (so the broker can be sure that the 
segment was actually uploaded just by looking at the new state, can be helpful 
in case when the deep-store was 
[bypassed](https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion)
 during commit for some reason). But on second thoughts it appears this adds 
unnecessary complexity.
   
   We haven't really done a cost comparison of lazy-loading vs. `mmap` on local 
Pinot servers. But I'll throw some numbers here. Our ingestion rate is 
increasing quite rapidly and we're looking at around 4-5T/day of data in the 
next few months (these are conservative numbers). With a retention period of 30 
days (again min.), we'll have to store 150T worth of segments on SSDs at any 
time. Storage costs can be prohibitive with this much data. Not to mention all 
of this to serve a tiny amount of queries (< 10%) We'll try to do a proper cost 
analysis of this today.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to