Jackie-Jiang commented on code in PR #18020:
URL: https://github.com/apache/pinot/pull/18020#discussion_r3011530558
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -962,24 +966,18 @@ protected void doTakeSnapshot() {
continue;
}
ThreadSafeMutableRoaringBitmap queryableDocIdsBitMap = null;
- ThreadSafeMutableRoaringBitmap validDocIdsBitMap =
- immutableSegment.getValidDocIds() == null ? new
ThreadSafeMutableRoaringBitmap()
- : immutableSegment.getValidDocIds();
-
immutableSegment.persistDocIdsSnapshot(V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME,
- validDocIdsBitMap);
+ ThreadSafeMutableRoaringBitmap validDocIdsBitMap =
immutableSegment.getValidDocIds();
+
immutableSegment.persistDocIdsSnapshot(V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME,
validDocIdsBitMap);
if (_deleteRecordColumn != null) {
- queryableDocIdsBitMap =
- immutableSegment.getQueryableDocIds() == null ? new
ThreadSafeMutableRoaringBitmap()
- : immutableSegment.getQueryableDocIds();
+ queryableDocIdsBitMap = immutableSegment.getQueryableDocIds();
immutableSegment.persistDocIdsSnapshot(V1Constants.QUERYABLE_DOC_IDS_SNAPSHOT_FILE_NAME,
queryableDocIdsBitMap);
}
_updatedSegmentsSinceLastSnapshot.remove(segment);
numImmutableSegments++;
- numPrimaryKeysInSnapshot +=
validDocIdsBitMap.getMutableRoaringBitmap().getCardinality();
+ numPrimaryKeysInSnapshot += validDocIdsBitMap.getCardinality();
Review Comment:
We are reading bytes and cardinality in multiple places, and updates is
actually possible between these reads (e.g. one segment removed by retention
manager). To ensure consistency, we might want to use a snapshot clone
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -962,24 +966,18 @@ protected void doTakeSnapshot() {
continue;
}
ThreadSafeMutableRoaringBitmap queryableDocIdsBitMap = null;
- ThreadSafeMutableRoaringBitmap validDocIdsBitMap =
- immutableSegment.getValidDocIds() == null ? new
ThreadSafeMutableRoaringBitmap()
- : immutableSegment.getValidDocIds();
-
immutableSegment.persistDocIdsSnapshot(V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME,
- validDocIdsBitMap);
+ ThreadSafeMutableRoaringBitmap validDocIdsBitMap =
immutableSegment.getValidDocIds();
Review Comment:
I feel we might still want to have the null check. In case segment contains
an invalid snapshot, it will have `null` bitmap but still has snapshot file
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/immutable/ImmutableSegmentImpl.java:
##########
@@ -160,13 +159,14 @@ public void persistDocIdsSnapshot(String fileName,
ThreadSafeMutableRoaringBitma
LOGGER.warn("Previous snapshot was not taken cleanly. Remove tmp file:
{}", tmpFile);
FileUtils.deleteQuietly(tmpFile);
}
- MutableRoaringBitmap docIdsSnapshot = docIds.getMutableRoaringBitmap();
- try (DataOutputStream dataOutputStream = new DataOutputStream(new
FileOutputStream(tmpFile))) {
- docIdsSnapshot.serialize(dataOutputStream);
+ byte[] bytes = docIds.toBytes();
Review Comment:
Can you benchmark the performance of serialization vs clone? Will this cause
longer pause?
Also, will this save heap memory? Should we consider serializing using
off-heap memory?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]