zhtaoxiang commented on code in PR #9890:
URL: https://github.com/apache/pinot/pull/9890#discussion_r1040662594


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java:
##########
@@ -61,13 +63,26 @@
 /**
  * A {@link PinotTaskGenerator} implementation for generating tasks of type 
{@link MergeRollupTask}
  *
- * TODO: Add the support for realtime table
+ * Assumptions:
+ *  - When the MergeRollupTask starts the first time, records older than the 
min(now ms, max end time ms of all ready to
+ *    process segments) - bufferTimeMs have already been ingested. If not, 
newly ingested records older than that time
+ *    may not be properly merged (Due to the latest watermarks advanced too 
far before records are ingested).
+ *  - If it is needed, there are backfill protocols to ingest and replace 
records older than the latest watermarks.
+ *    Those protocols can handle time alignment (according to merge levels 
configurations) correctly.
+ *  - If it is needed, there are reconcile protocols to merge & rollup newly 
ingested segments that are (1) older than
+ *    the latest watermarks, and (2) not time aligned according to merge 
levels configurations
+ *    - For realtime tables, those protocols are needed if streaming records 
arrive late (older thant the latest
+ *      watermarks)
+ *    - For offline tables, those protocols are needed if there are 
non-time-aligned segments ingested accidentally.
  *
- * Steps:
  *
+ * Steps:
  *  - Pre-select segments:
  *    - Fetch all segments, select segments based on segment lineage (removing 
segmentsFrom for COMPLETED lineage
  *      entry and segmentsTo for IN_PROGRESS lineage entry)
+ *    - For realtime tables, remove
+ *      - in-progress segments, and

Review Comment:
   Nope. The IN_PROGRESS that is defined in Segment.Realtime.Status



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to