chenboat opened a new issue, #12513: URL: https://github.com/apache/pinot/issues/12513
Currently Pinot's adaptive realtime segment sizing algorithm (as documented [here](https://www.linkedin.com/blog/engineering/open-source/auto-tuning-pinot) makes the segment sizes converge to a target byte size based on the following assumption. It adjusts the **rows** of new segments based on the rows in the previous segments. > We assume that the ratio of segment size to number of rows is a constant for each table (say, R). This assumption may not be valid for the spiky traffic uses (e.g., search log data ingestion because log data volume depends on services state and can be highly volatile). Our result using the adaptive sizing algorithm shows that segments varied a lot because the data size per row changes. We propose to change to segment size prediction based on actual stream data consumed instead -- which is a more accurate measure than the row count. After one server replicas finishes conuming the target number of bytes, it can commit the segments and work with the rest of the replicas to either catch up to the offset reached (if they have not done so) or ask them to download and replace the finished segment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org