[I] Auto-tuning Pinot real-time segment size based on actual stream data consumption [pinot]

via GitHub Wed, 28 Feb 2024 09:59:28 -0800


chenboat opened a new issue, #12513:
URL: https://github.com/apache/pinot/issues/12513


   Currently Pinot's adaptive realtime segment sizing algorithm (as documented 
[here](https://www.linkedin.com/blog/engineering/open-source/auto-tuning-pinot) 
 makes the segment sizes converge to a target byte size based on the following 
assumption. It adjusts the **rows** of new segments based on the rows in the 
previous segments.
   
   > We assume that the ratio of segment size to number of rows is a constant 
for each table (say, R).
   
   This assumption may not be valid for the spiky traffic uses (e.g., search 
log data ingestion because log data volume depends on services state and can be 
highly volatile). Our result using the adaptive sizing algorithm shows that 
segments varied a lot because the data size per row changes.
   
   We propose to change to segment size prediction based on actual stream data 
consumed instead -- which is a more accurate measure than the row count. After 
one server replicas finishes conuming the target number of bytes, it can commit 
the segments and work with the rest of the replicas to either catch up to the 
offset reached (if they have not done so) or ask them to download and replace 
the finished segment. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[I] Auto-tuning Pinot real-time segment size based on actual stream data consumption [pinot]

Reply via email to