Jackie-Jiang commented on code in PR #18020:
URL: https://github.com/apache/pinot/pull/18020#discussion_r3011530558


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -962,24 +966,18 @@ protected void doTakeSnapshot() {
           continue;
         }
         ThreadSafeMutableRoaringBitmap queryableDocIdsBitMap = null;
-        ThreadSafeMutableRoaringBitmap validDocIdsBitMap =
-            immutableSegment.getValidDocIds() == null ? new 
ThreadSafeMutableRoaringBitmap()
-                : immutableSegment.getValidDocIds();
-        
immutableSegment.persistDocIdsSnapshot(V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME,
-            validDocIdsBitMap);
+        ThreadSafeMutableRoaringBitmap validDocIdsBitMap = 
immutableSegment.getValidDocIds();
+        
immutableSegment.persistDocIdsSnapshot(V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME,
 validDocIdsBitMap);
         if (_deleteRecordColumn != null) {
-          queryableDocIdsBitMap =
-              immutableSegment.getQueryableDocIds() == null ? new 
ThreadSafeMutableRoaringBitmap()
-                  : immutableSegment.getQueryableDocIds();
+          queryableDocIdsBitMap = immutableSegment.getQueryableDocIds();
           
immutableSegment.persistDocIdsSnapshot(V1Constants.QUERYABLE_DOC_IDS_SNAPSHOT_FILE_NAME,
               queryableDocIdsBitMap);
         }
         _updatedSegmentsSinceLastSnapshot.remove(segment);
         numImmutableSegments++;
-        numPrimaryKeysInSnapshot += 
validDocIdsBitMap.getMutableRoaringBitmap().getCardinality();
+        numPrimaryKeysInSnapshot += validDocIdsBitMap.getCardinality();

Review Comment:
   We are reading bytes and cardinality in multiple places, and updates is 
actually possible between these reads (e.g. one segment removed by retention 
manager). To ensure consistency, we might want to use a snapshot clone



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -962,24 +966,18 @@ protected void doTakeSnapshot() {
           continue;
         }
         ThreadSafeMutableRoaringBitmap queryableDocIdsBitMap = null;
-        ThreadSafeMutableRoaringBitmap validDocIdsBitMap =
-            immutableSegment.getValidDocIds() == null ? new 
ThreadSafeMutableRoaringBitmap()
-                : immutableSegment.getValidDocIds();
-        
immutableSegment.persistDocIdsSnapshot(V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME,
-            validDocIdsBitMap);
+        ThreadSafeMutableRoaringBitmap validDocIdsBitMap = 
immutableSegment.getValidDocIds();

Review Comment:
   I feel we might still want to have the null check. In case segment contains 
an invalid snapshot, it will have `null` bitmap but still has snapshot file



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/immutable/ImmutableSegmentImpl.java:
##########
@@ -160,13 +159,14 @@ public void persistDocIdsSnapshot(String fileName, 
ThreadSafeMutableRoaringBitma
         LOGGER.warn("Previous snapshot was not taken cleanly. Remove tmp file: 
{}", tmpFile);
         FileUtils.deleteQuietly(tmpFile);
       }
-      MutableRoaringBitmap docIdsSnapshot = docIds.getMutableRoaringBitmap();
-      try (DataOutputStream dataOutputStream = new DataOutputStream(new 
FileOutputStream(tmpFile))) {
-        docIdsSnapshot.serialize(dataOutputStream);
+      byte[] bytes = docIds.toBytes();

Review Comment:
   Can you benchmark the performance of serialization vs clone? Will this cause 
longer pause?
   Also, will this save heap memory? Should we consider serializing using 
off-heap memory?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to