shounakmk219 opened a new pull request, #13584: URL: https://github.com/apache/pinot/pull/13584
# Description This PR allows pinot to impose storage quota restrictions on realtime tables. To block the ingestion on realtime tables below consideration are kept in mind 1. Replicas should be consistent. 2. Ingestion should auto resume upon quota increase or freed up storage. 3. Easy observability into the blocked ingestion with relevant info. 4. Query results should be consistent Also to keep it simple the ingestion is blocked only when storage quotas are breached # Blocking ingestion - The storage quota will be imposed only upon successful consuming segment completion. - `SegmentCompletionManager` will first calculate whether the table is within quota. - If the table is breaching quota, `IS_QUOTA_EXCEEDED` flag will be set on the table IS - This flag is checked during the new consuming segment creation - If it’s `true` then new consuming segments will not be created - Ongoing segment commit will be unaffected. ## What if the `IS_QUOTA_EXCEEDED` flag is set to false manually on zk directly? Setting the `IS_QUOTA_EXCEEDED` flag to `false` alone will not create the new consuming segments. New consuming segments can be created by: 1. Calling the `/resumeConsumption` API on controller 2. Running `RealtimeSegmentValidationManager` job with setting `recreateDeletedConsumingSegment` as true The resume consumption API internally depends on the `RealtimeSegmentValidationManager` task itself to handle the consuming segment creation. Hence to cover both the ways we only need to handle the quota exceed check at `RealtimeSegmentValidationManager`. `RealtimeSegmentValidationManager` will again check for the storage quota and set the `IS_QUOTA_EXCEEDED` flag to the right value. # Resuming ingestion Once a table exceeds the storage quota, we may need to resume the consumption in below cases: 1. Storage quota is extended for the table 2. Storage frees up upon older segments being deleted (by retention manager) To automate the resume consumption flow we will depend on the `RealtimeSegmentValidationManager` job itself. In case the user does not want to wait for the job to run, they can manually trigger the job from the controller API or use the resume consumption endpoint itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org