Jackie-Jiang opened a new pull request, #11943: URL: https://github.com/apache/pinot/pull/11943
Currently when committing a real-time segment, controller needs to read partition group metadata for all partitions from upstream, which can be very slow for stream with lots of partitions. The partition group metadata is used only to extract the partition ids, which can be simply derived from partition count except for stream that closes partitions such as Kinesis. In this PR, we made the following changes: 1. Only read partition count from upstream if available 2. If partition count is not available, fall back to the current approach 3. Log the time spent in each step for debugging 4. In `SegmentFlushThresholdComputer`, remove the logic of only counting the segments with smallest partition id for the size ratio because it complicates the handling (quite anti-pattern as any segment commit requires info from all partitions) a lot, and I don't see much value from it. Quickly converging to the size ratio of recent data trend should be a pro instead of a con because this ratio is used to decide the segment size to consume data for the same period of time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org