lnbest0707-uber opened a new pull request, #16724:
URL: https://github.com/apache/pinot/pull/16724

   `real-time` `ingestion` `feature`
   Part 3 of https://github.com/apache/pinot/pull/15782
   Issue https://github.com/apache/pinot/issues/14815
   Design doc 
https://docs.google.com/document/d/1NKPeNh6V2ctaQ4T_X3OKJ6Gcy5TRanLiU1uIDT8_9UA/edit?usp=sharing
   
   Once the offset auto reset (in 
[1/3](https://github.com/apache/pinot/pull/16492)) happens, users can choose to 
backfill the skipped in-between data. To enable that, configs to be added are:
   
   - `controller.realtime.offsetAutoReset.backfill.enabled = true` in 
controller config.
   - `realtimeOffsetAutoResetHandlerClass` in `streamIngestionConfig` to 
determine how to handle the backfill. This is a plugin with defined the 
interfaces.
   
   The `RealtimeOffsetAutoResetManager` could:
   
   - Construct the handler per user's config.
   - Trigger the backfill once such helix message/event sent after reset 
happened.
   - Periodically scan the table config and find the existing backfill topics.
   - Afterwards, it can check the backfill status by the handler and trigger 
the clean up if needed.
   
   Besides, the PR also introduces an alternative abstract implementation of 
backfill based on multi-topic ingestion:
   
   - Ask Kafka Ecosystem to replicate the skipped offsets once reset happens.
   - Add the replicated topic to the table. Or if already added, resume the 
consumption if needed.
   - Pause the backfill topic's consumption if backfill completes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to