shounakmk219 opened a new pull request, #13584:
URL: https://github.com/apache/pinot/pull/13584

   # Description
   
   This PR allows pinot to impose storage quota restrictions on realtime tables.
   To block the ingestion on realtime tables below consideration are kept in 
mind
   
   1. Replicas should be consistent.
   2. Ingestion should auto resume upon quota increase or freed up storage.
   3. Easy observability into the blocked ingestion with relevant info. 
   4. Query results should be consistent
   
   Also to keep it simple the ingestion is blocked only when storage quotas are 
breached
   
   # Blocking ingestion
   
   - The storage quota will be imposed only upon successful consuming segment 
completion.
   - `SegmentCompletionManager` will first calculate whether the table is 
within quota.
   - If the table is breaching quota, `IS_QUOTA_EXCEEDED` flag will be set on 
the table IS
   - This flag is checked during the new consuming segment creation
   - If it’s `true` then new consuming segments will not be created
   - Ongoing segment commit will be unaffected.
   
   ## What if the `IS_QUOTA_EXCEEDED` flag is set to false manually on zk 
directly?
   
   Setting the `IS_QUOTA_EXCEEDED` flag to `false` alone will not create the 
new consuming segments. New consuming segments can be created by:
   
   1. Calling the `/resumeConsumption` API on controller
   2. Running `RealtimeSegmentValidationManager` job with setting 
`recreateDeletedConsumingSegment` as true
   
   The resume consumption API internally depends on the 
`RealtimeSegmentValidationManager` task itself to handle the consuming segment 
creation. Hence to cover both the ways we only need to handle the quota exceed 
check at `RealtimeSegmentValidationManager`.
   `RealtimeSegmentValidationManager` will again check for the storage quota 
and set the `IS_QUOTA_EXCEEDED` flag to the right value.
   
   # Resuming ingestion
   
   Once a table exceeds the storage quota, we may need to resume the 
consumption in below cases:
   
   1. Storage quota is extended for the table
   2. Storage frees up upon older segments being deleted (by retention manager)
   
   To automate the resume consumption flow we will depend on the 
`RealtimeSegmentValidationManager` job itself. 
   In case the user does not want to wait for the job to run, they can manually 
trigger the job from the controller API or use the resume consumption endpoint 
itself.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to