suddendust commented on issue #7229: URL: https://github.com/apache/pinot/issues/7229#issuecomment-892919348
Thanks @mcvsubbu for the review and the pointers. As for the FSM, this is the new state I proposed: <img width="796" alt="Screenshot 2021-08-05 at 12 00 38 AM" src="https://user-images.githubusercontent.com/84911643/128235324-23760e70-861d-43ca-b6ab-a6e338ba20b6.png"> The reason I introduced a new state is because it looked like a cleaner way for the broker to determine that a segment has been purged and has been moved to the deep-store. Certainly, the broker can also determine this by first determining that the segment is absent, and then looking at its S3 location in the segment config. Just that the code will be a bit less clean in this case. With the new state, I was thinking of defining an invariant that a segment moves to this state _iff_ it was successfully uploaded AND its URL was successfully updated in its metadata (so the broker can be sure that the segment was actually uploaded just by looking at the new state, can be helpful in case when the deep-store was [bypassed](https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion) during commit for some reason). But on second thoughts it appears this adds unnecessary complexity. We haven't really done a cost comparison of lazy-loading vs. `mmap` on local Pinot servers. But I'll throw some numbers here. Our ingestion rate is increasing quite rapidly and we're looking at around 4-5T/day of data in the next few months (these are conservative numbers). With a retention period of 30 days (again min.), we'll have to store 150T worth of segments on SSDs at any time. Storage costs can be prohibitive with this much data. Not to mention all of this to serve a tiny amount of queries (< 10%) We'll try to do a proper cost analysis of this today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org