klsince commented on code in PR #15296: URL: https://github.com/apache/pinot/pull/15296#discussion_r2006332515
########## pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java: ########## @@ -1723,6 +1725,15 @@ public RealtimeSegmentDataManager(SegmentZKMetadata segmentZKMetadata, TableConf } } + // Consumption while building the segment is not allowed for the following tables: + // 1. Partial Upserts Review Comment: good question about the snapshot. a key requirement of snapshots kept on disk is that the valid docs from the bitmap snapshots must be disjointed (i.e. all unique PKs), so in theory, it's still correct not to take snapshot of the segments being built and replaced timely, because if its bitmap is not persisted, then the existing snapshots on disk still compose a disjoint set of valid docs. But to be conservative, I think we can have both partial/full upsert checked here, but add a feature like `isAllowPartialUpsertConsumptionDuringCommit` for full upsert, so the check can be disabled if needed. ########## pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java: ########## @@ -1066,7 +1067,7 @@ protected SegmentBuildDescriptor buildSegmentInternal(boolean forCommit) { // for partial upsert tables, do not release _partitionGroupConsumerSemaphore proactively and rely on offload() // to release the semaphore. This ensures new consuming segment is not consuming until the segment replacement is // complete. - if (_allowConsumptionDuringCommit) { + if (_allowConsumptionDuringBuild) { Review Comment: I see. Is it safe to assume buildSegmentInternal() is only called when server has already consumed the data itself? IIUC, this must be the key assumption to release semaphore earlier. would it help to log a msg inside the two if-blocks to help debug, so we know the reason of releasing semaphore earlier? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org