snleee commented on code in PR #9890: URL: https://github.com/apache/pinot/pull/9890#discussion_r1041195396
########## pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java: ########## @@ -61,13 +63,26 @@ /** * A {@link PinotTaskGenerator} implementation for generating tasks of type {@link MergeRollupTask} * - * TODO: Add the support for realtime table + * Assumptions: + * - When the MergeRollupTask starts the first time, records older than the min(now ms, max end time ms of all ready to + * process segments) - bufferTimeMs have already been ingested. If not, newly ingested records older than that time + * may not be properly merged (Due to the latest watermarks advanced too far before records are ingested). + * - If it is needed, there are backfill protocols to ingest and replace records older than the latest watermarks. + * Those protocols can handle time alignment (according to merge levels configurations) correctly. + * - If it is needed, there are reconcile protocols to merge & rollup newly ingested segments that are (1) older than + * the latest watermarks, and (2) not time aligned according to merge levels configurations + * - For realtime tables, those protocols are needed if streaming records arrive late (older thant the latest + * watermarks) + * - For offline tables, those protocols are needed if there are non-time-aligned segments ingested accidentally. * - * Steps: * + * Steps: * - Pre-select segments: * - Fetch all segments, select segments based on segment lineage (removing segmentsFrom for COMPLETED lineage * entry and segmentsTo for IN_PROGRESS lineage entry) + * - For realtime tables, remove + * - in-progress segments, and Review Comment: I saw that you already did! thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org