[I] Pinot real-time data ingestion freshness guarantee [pinot]

via GitHub Tue, 14 Jan 2025 17:20:54 -0800


lnbest0707-uber opened a new issue, #14815:
URL: https://github.com/apache/pinot/issues/14815


   [Issue and Proposal]
   
   One of the biggest challenge of Pinot real-time data ingestion is to always 
keep ingestion latency low, especially when experiencing traffic volume 
increase (spikes) from the stream, e.g. Kafka.
   
   Pinot underlying is doing 1-1 subscription to stream partitions. Therefore, 
one single consuming segment thread is responsible for pulling data from one 
stream partition. The ingestion speed is bottlenecked by the single CPU core 
computation speed. With heavy decoding, transformation and indexing usages, the 
per partition ingestion speed would be capped. Hence, **once the stream per 
partition data volume spikes, Pinot might experience high ingestion latency.**
   
   To catch up with the latest messages, Pinot usually has to seek for 
partition increase on the stream server, e.g. Kafka partition number bumps. 
This operation usually requires human involvement and cannot be committed 
immediately. The table would, as a result, experience high ingestion latency 
for a long time and in most of cases, cannot recover quickly even after the 
partition number increases.
   
   Meanwhile, in many realtime analytics use cases, e.g. observability related 
non-upserting cases
   
   - Users do not require strong message order guarantee.
   - Most recent (e.g. last 15 minutes) data's value is far higher than old 
data.
   
   As a result, in the real world operations, users might require Pinot admins 
to manually reset ingestion offset to the latest by:
   
   1. Call `Table/pauseConsumption` API
   2. After all segments sealed, call `Table/resumeConsumption` API to resume 
from `consumeFrom = largest`
   It would consequently force consuming the latest messages immediately **but 
would lose the in-between messages we'd skipped**.
   
   With above issues, a near-freshness-guarantee feature would be required, so 
that it could:
   
   - Reset the offset automatically based on the offset or timestamp lag
   - Consume and serve the latest messages instantly **but do not lose the 
in-between data permanently**
   - Ingest the in-between data in a offline or near-real-time behaviors
   
   To fulfill this requirement, a proposed solution would be based on:
   
   1. When segment seals, controller checks the last ingested offset, compare 
with Kafka’s largest offset and determine if needs to skip the offset based on 
(by configurations)
       - Offset` difference
       - Timestamp difference (based on Kafka meta data)
   2. If skip from offset1 to offset2, spawn the backfill job
   
   The backfill could be based on Pinot Minion job:
   
   -  Controller update the ideal state, and force next segment to consume from 
offset2
   -  Dump the offset skip info, (Pinot partition number, offset1, offset2) to 
helix/zk
   - Trigger the Minion job to consume the according offset and build segments 
offline
   
   Alternatively, the backfill could also be done in a near-real-time way with 
multi-topics ingestion (https://github.com/apache/pinot/pull/13790):
   
   - Controller add a temporary topic to the table
   - The temporary topic would be responsible to consume the missing offsets
   
   In particular, if the stream system already has replication support, like 
[Uber 
uReplicator](https://www.uber.com/blog/ureplicator-apache-kafka-replicator/), 
to replicate the in-between segments to another topic. Then it could be 
simplified as:
   
   1. Controller notify the stream system to replicate (topic name, partition 
number, from offset, to offset) to "new topic"
   2. Controller temporarily add "new topic" to table config and ideal state
   3. Finish the consumption of the "new topic"
   4. Remove the temporary topic
   
   Besides, with both proposals, some challenges need to be considered/resolved:
   
   - How to deal with multiple offset resets on a partition
   - How to split the offset range to multiple minion jobs (what kind of policy 
to follow?) Or how many partitions should we provision for the "new topic"?
   - How to save and update the offset skip and backfill status?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[I] Pinot real-time data ingestion freshness guarantee [pinot]

Reply via email to