snleee commented on a change in pull request #7481: URL: https://github.com/apache/pinot/pull/7481#discussion_r726555357
########## File path: pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java ########## @@ -388,6 +425,13 @@ private boolean validate(TableConfig tableConfig, String taskType) { return true; } + /** + * Check if the segment span multiple buckets + */ + private boolean hasSpilledOverData(SegmentZKMetadata segmentZKMetadata, long bucketMs) { + return segmentZKMetadata.getStartTimeMs() / bucketMs != segmentZKMetadata.getEndTimeMs() / bucketMs; Review comment: Do we have a guarantee of `startTime <= endTime` from segmentZKMetadata? If not, we may need to check `startTime/bucketMs < endTime / bucketMs` ########## File path: pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskUtils.java ########## @@ -34,7 +35,8 @@ private MergeRollupTaskUtils() { MergeTask.ROUND_BUCKET_TIME_PERIOD_KEY, MergeTask.MERGE_TYPE_KEY, MergeTask.MAX_NUM_RECORDS_PER_SEGMENT_KEY, - MergeTask.MAX_NUM_RECORDS_PER_TASK_KEY + MergeTask.MAX_NUM_RECORDS_PER_TASK_KEY, + MergeRollupTask.NUM_PARALLEL_BUCKETS Review comment: Why don't we put this to `MergeTask` instead of `MergeRollupTask`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org