klsince commented on code in PR #15296:
URL: https://github.com/apache/pinot/pull/15296#discussion_r2006332515


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java:
##########
@@ -1723,6 +1725,15 @@ public RealtimeSegmentDataManager(SegmentZKMetadata 
segmentZKMetadata, TableConf
     }
   }
 
+  // Consumption while building the segment is not allowed for the following 
tables:
+  // 1. Partial Upserts

Review Comment:
   good question about the snapshot.
   
   a key requirement of snapshots kept on disk is that the valid docs from the 
bitmap snapshots must be disjointed (i.e. all unique PKs), so in theory, it's 
still correct not to take snapshot of the segments being built and replaced 
timely, because if its bitmap is not persisted, then the existing snapshots on 
disk still compose a disjoint set of valid docs. But to be conservative, I 
think we can have both partial/full upsert checked here, but add a feature like 
`isAllowPartialUpsertConsumptionDuringCommit` for full upsert, so the check can 
be disabled if needed.



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java:
##########
@@ -1066,7 +1067,7 @@ protected SegmentBuildDescriptor 
buildSegmentInternal(boolean forCommit) {
     // for partial upsert tables, do not release 
_partitionGroupConsumerSemaphore proactively and rely on offload()
     // to release the semaphore. This ensures new consuming segment is not 
consuming until the segment replacement is
     // complete.
-    if (_allowConsumptionDuringCommit) {
+    if (_allowConsumptionDuringBuild) {

Review Comment:
   I see. Is it safe to assume buildSegmentInternal() is only called when 
server has already consumed the data itself? IIUC, this must be the key 
assumption to release semaphore earlier.
   
   would it help to log a msg inside the two if-blocks to help debug, so we 
know the reason of releasing semaphore earlier?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to