lnbest0707-uber opened a new pull request, #16724: URL: https://github.com/apache/pinot/pull/16724
`real-time` `ingestion` `feature` Part 3 of https://github.com/apache/pinot/pull/15782 Issue https://github.com/apache/pinot/issues/14815 Design doc https://docs.google.com/document/d/1NKPeNh6V2ctaQ4T_X3OKJ6Gcy5TRanLiU1uIDT8_9UA/edit?usp=sharing Once the offset auto reset (in [1/3](https://github.com/apache/pinot/pull/16492)) happens, users can choose to backfill the skipped in-between data. To enable that, configs to be added are: - `controller.realtime.offsetAutoReset.backfill.enabled = true` in controller config. - `realtimeOffsetAutoResetHandlerClass` in `streamIngestionConfig` to determine how to handle the backfill. This is a plugin with defined the interfaces. The `RealtimeOffsetAutoResetManager` could: - Construct the handler per user's config. - Trigger the backfill once such helix message/event sent after reset happened. - Periodically scan the table config and find the existing backfill topics. - Afterwards, it can check the backfill status by the handler and trigger the clean up if needed. Besides, the PR also introduces an alternative abstract implementation of backfill based on multi-topic ingestion: - Ask Kafka Ecosystem to replicate the skipped offsets once reset happens. - Add the replicated topic to the table. Or if already added, resume the consumption if needed. - Pause the backfill topic's consumption if backfill completes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
