jtao15 commented on a change in pull request #7481: URL: https://github.com/apache/pinot/pull/7481#discussion_r717066234
########## File path: pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java ########## @@ -77,20 +77,23 @@ * A new ZNode will be created, with watermarkMs as the smallest time found in all segments truncated to the * closest bucket start time. * - The execution window for the task is calculated as, - * windowStartMs = watermarkMs, windowEndMs = windowStartMs + bucketTimeMs + * windowStartMs = watermarkMs, windowEndMs = windowStartMs + bucketTimeMs * numParallelBuckets * - Skip scheduling if the window is invalid: * - If the execution window is not older than bufferTimeMs, no task will be generated * - The windowEndMs of higher merge level should be less or equal to the waterMarkMs of lower level * - Bump up target window and watermark if needed. * - If there's no unmerged segments (by checking segment zk metadata {mergeRollupTask.mergeLevel: level}) for - * current window, - * keep bumping up the watermark and target window until unmerged segments are found. Else skip the scheduling. - * - Select all segments for the target window - * - Create tasks (per partition for partitioned table) based on maxNumRecordsPerTask + * current window, keep bumping up the watermark and target window until unmerged segments are found. + * Else skip the scheduling. + * - Select segments for each bucket in the target window: + * - Skip buckets which all segments are merged + * - Pick buckets till the first bucket that has spilled over data Review comment: I'm trying to explain that the picking process will stop if we have one bucket with spilled over data. Updated the comment, hope it's clear now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org