tarun11Mavani opened a new issue, #15845:
URL: https://github.com/apache/pinot/issues/15845

   UpsertCompactMergeTask was introduced in 
[#14477](https://github.com/apache/pinot/pull/14477). I am creating this parent 
issue to track work required to make this feature production ready. 
   
   - Fix the data inconsistency issue across segment replica 
   - SegmentRefresh task compatibility with UpsertCompactMerge task 
[#14633](https://github.com/apache/pinot/issues/14633)
   - Add documentation for UpsertCompactMergeTask
   
   
   ### Data inconsistency issue across segment replica due to different segment 
creation time
   
   **Description:**
   
   We've identified an issue where discrepancies in segment creation times 
across replicas lead to inconsistent behavior during merge compaction, 
resulting in data inconsistencies across servers. After several runs of the 
MergeCompactTask, we observed data inconsistencies across segment replicas. A 
COUNT(*) query began returning inconsistent total row counts for a table where 
consumption had been paused. 
   Additionally, querying by a specific primary key, which should consistently 
return exactly one record regardless of the server handling the query, showed 
inconsistent behavior—sometimes returning one record, and other times 
none—depending on the server. 
   
   
   **Root Cause Analysis:**
   
   During merge compaction, Pinot determines the creation time of the new 
segment (creationTimeNewSegment) using the maximum creation time among the old 
segments 
[here](https://github.com/apache/pinot/blob/c9f0c47d0ad96607760b706a79802d1598222ef3/pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/upsertcompactmerge/UpsertCompactMergeTaskExecutor.java#L103):
   `creationTimeNewSegment >= max(creationTime(oldSegments))` 
   However, since replicas of the same segment can have different creation 
times across servers, this approach can lead to inconsistencies.
   
   
   **Scenario:**
   
   - Record R1 is indexed in segment S1, which is committed at time T on 
Server1 and T+10 on Server2.
   
   - S1 is selected for merge compaction along with segment S0. The new 
compacted segment, compact_S3, is assigned a creation time of T (based on the 
minimum creation time among replicas).
   
   When compact_S3 is added or replaced:
   
   - Server1: The comparison of R1's value is the same in S1 and compact_S3, 
and their creation times are also the same (T). Therefore, 
shouldReplaceOnComparisonTie returns true, and R1 in S1 is replaced with R1 in 
compact_S3.
   
   - Server2: The comparison of R1's value is the same, but S1's creation time 
is T+10, while compact_S3's is T. Thus, shouldReplaceOnComparisonTie returns 
false, and R1 in S1 is retained.
   
   In the next compaction task, validDocIds of older segments are fetched. If 
Server1's validDocIds for S1 indicate all records are replaced, S1 is marked 
for deletion. Consequently, S1 is deleted from both Server1 and Server2.
   
   Post-deletion:
   
   - Server1: No impact; PK metadata points to R1 in compact_S3.
   
   - Server2: PK metadata still points to R1 in the now-deleted S1, leading to 
data inconsistency.
   
   **Proposed Solution:**
   
   To ensure consistency across replicas during merge compaction, we propose 
modifying the logic to determine creationTimeNewSegment by using the maximum 
creation time across all replicas of the old segments. This approach would 
ensure that the shouldReplaceOnComparisonTie function behaves consistently 
across all servers.
   
   **Next Steps:**
   
   I plan to raise a PR implementing this change.
   
   cc: @klsince @Jackie-Jiang @ankitsultana @rohityadav1993 @tibrewalpratik17


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to