Copilot commented on code in PR #18020:
URL: https://github.com/apache/pinot/pull/18020#discussion_r3013582689
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManagerForConsistentDeletes.java:
##########
@@ -564,6 +564,7 @@ protected GenericRow doUpdateRecord(GenericRow record,
RecordInfo recordInfo) {
_reusePreviousRow.init(currentSegment, currentDocId);
_partialUpsertHandler.merge(_reusePreviousRow, record,
_reuseMergeResultHolder);
_reuseMergeResultHolder.clear();
+ _reusePreviousRow.clear();
Review Comment:
Same as in the non-consistent-deletes manager: `_reusePreviousRow.clear()`
(and `_reuseMergeResultHolder.clear()`) should be in a finally block so the
reusable `LazyRow` doesn't retain the segment reference/cached values if
`_partialUpsertHandler.merge(...)` throws.
```suggestion
try {
_partialUpsertHandler.merge(_reusePreviousRow, record,
_reuseMergeResultHolder);
} finally {
_reuseMergeResultHolder.clear();
_reusePreviousRow.clear();
}
```
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java:
##########
@@ -431,6 +431,7 @@ protected GenericRow doUpdateRecord(GenericRow record,
RecordInfo recordInfo) {
_reusePreviousRow.init(currentSegment, currentDocId);
_partialUpsertHandler.merge(_reusePreviousRow, record,
_reuseMergeResultHolder);
_reuseMergeResultHolder.clear();
+ _reusePreviousRow.clear();
Review Comment:
`_reusePreviousRow.clear()` (and `_reuseMergeResultHolder.clear()`) are only
executed when `_partialUpsertHandler.merge(...)` returns normally. If merge
throws, the reusable `LazyRow` can keep holding the previous segment reference
and cached values, reintroducing the memory retention issue. Wrap the
merge/clear sequence in a try/finally to guarantee cleanup.
```suggestion
try {
_partialUpsertHandler.merge(_reusePreviousRow, record,
_reuseMergeResultHolder);
} finally {
_reuseMergeResultHolder.clear();
_reusePreviousRow.clear();
}
```
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -962,24 +966,18 @@ protected void doTakeSnapshot() {
continue;
}
ThreadSafeMutableRoaringBitmap queryableDocIdsBitMap = null;
- ThreadSafeMutableRoaringBitmap validDocIdsBitMap =
- immutableSegment.getValidDocIds() == null ? new
ThreadSafeMutableRoaringBitmap()
- : immutableSegment.getValidDocIds();
-
immutableSegment.persistDocIdsSnapshot(V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME,
- validDocIdsBitMap);
+ ThreadSafeMutableRoaringBitmap validDocIdsBitMap =
immutableSegment.getValidDocIds();
+
immutableSegment.persistDocIdsSnapshot(V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME,
validDocIdsBitMap);
if (_deleteRecordColumn != null) {
- queryableDocIdsBitMap =
- immutableSegment.getQueryableDocIds() == null ? new
ThreadSafeMutableRoaringBitmap()
- : immutableSegment.getQueryableDocIds();
+ queryableDocIdsBitMap = immutableSegment.getQueryableDocIds();
immutableSegment.persistDocIdsSnapshot(V1Constants.QUERYABLE_DOC_IDS_SNAPSHOT_FILE_NAME,
queryableDocIdsBitMap);
}
Review Comment:
`ImmutableSegmentImpl.getValidDocIds()` / `getQueryableDocIds()` are
annotated @Nullable, but this snapshot path now persists them unconditionally.
If either bitmap is null (e.g., segment hasn't had upsert enabled or TTL
handling left it unset), this will NPE inside `persistDocIdsSnapshot()` and/or
when calling `getCardinality()`. Consider adding explicit null handling here
(skip snapshot for that bitmap, or fail fast with a clear message) consistent
with the null checks added in the `segmentsWithoutSnapshot` loop.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/readers/LazyRow.java:
##########
@@ -46,7 +46,6 @@ public LazyRow() {
}
public void init(IndexSegment segment, int docId) {
Review Comment:
`init()` no longer clears the cached field/null-value state. Since
`getValue()` memoizes per-column results, reusing a single `LazyRow` instance
across different docIds/segments without calling `clear()` first can return
stale values from the previous row. This also contradicts the class-level
Javadoc that says reusing is done by re-initializing via `init(...)`. Consider
restoring `clear()` at the start of `init(...)` (while still keeping the new
post-merge `clear()` calls for eager segment reference cleanup), or otherwise
updating callers + Javadoc so `init(...)` cannot be misused.
```suggestion
public void init(IndexSegment segment, int docId) {
clear();
```
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -1004,21 +1002,23 @@ protected void doTakeSnapshot() {
}
try {
ThreadSafeMutableRoaringBitmap segmentQueryableDocIdsBitMap = null;
- ThreadSafeMutableRoaringBitmap segmentValidDocIdsBitMap =
- segment.getValidDocIds() == null ? new
ThreadSafeMutableRoaringBitmap() : segment.getValidDocIds();
-
segment.persistDocIdsSnapshot(V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME,
segmentValidDocIdsBitMap);
+ ThreadSafeMutableRoaringBitmap segmentValidDocIdsBitMap =
segment.getValidDocIds();
+ // segment that is out of TTL with no valid docId snapshot can be
skipped while taking snapshot
+ // i.e such segments will have null bitmap
+ if (segmentValidDocIdsBitMap != null) {
+
segment.persistDocIdsSnapshot(V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME,
segmentValidDocIdsBitMap);
+ numPrimaryKeysInSnapshot +=
segmentValidDocIdsBitMap.getCardinality();
+ }
if (_deleteRecordColumn != null) {
- segmentQueryableDocIdsBitMap = segment.getQueryableDocIds() ==
null ? new ThreadSafeMutableRoaringBitmap()
- : segment.getQueryableDocIds();
-
segment.persistDocIdsSnapshot(V1Constants.QUERYABLE_DOC_IDS_SNAPSHOT_FILE_NAME,
- segmentQueryableDocIdsBitMap);
+ segmentQueryableDocIdsBitMap = segment.getQueryableDocIds();
+ if (segmentQueryableDocIdsBitMap != null) {
+
segment.persistDocIdsSnapshot(V1Constants.QUERYABLE_DOC_IDS_SNAPSHOT_FILE_NAME,
+ segmentQueryableDocIdsBitMap);
+ numQueryableDocIdsInSnapshot +=
segmentQueryableDocIdsBitMap.getCardinality();
+ }
}
_updatedSegmentsSinceLastSnapshot.remove(segment);
numImmutableSegments++;
- numPrimaryKeysInSnapshot +=
segmentValidDocIdsBitMap.getMutableRoaringBitmap().getCardinality();
- if (_deleteRecordColumn != null) {
- numQueryableDocIdsInSnapshot +=
segmentQueryableDocIdsBitMap.getMutableRoaringBitmap().getCardinality();
- }
} catch (Exception e) {
Review Comment:
In the `segmentsWithoutSnapshot` loop, `numImmutableSegments++` and
`_updatedSegmentsSinceLastSnapshot.remove(segment)` happen even when
`segmentValidDocIdsBitMap` / `segmentQueryableDocIdsBitMap` are null and no
snapshot files are written. This can skew snapshot metrics/logging and also
drops the segment from the "needs snapshot" tracking, preventing a retry later.
Consider only counting/removing the segment from
`_updatedSegmentsSinceLastSnapshot` when at least one snapshot file is actually
persisted (or explicitly persisting an empty snapshot when that’s the desired
behavior).
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/immutable/ImmutableSegmentImpl.java:
##########
@@ -160,13 +159,14 @@ public void persistDocIdsSnapshot(String fileName,
ThreadSafeMutableRoaringBitma
LOGGER.warn("Previous snapshot was not taken cleanly. Remove tmp file:
{}", tmpFile);
FileUtils.deleteQuietly(tmpFile);
}
- MutableRoaringBitmap docIdsSnapshot = docIds.getMutableRoaringBitmap();
- try (DataOutputStream dataOutputStream = new DataOutputStream(new
FileOutputStream(tmpFile))) {
- docIdsSnapshot.serialize(dataOutputStream);
+ byte[] bytes = docIds.toBytes();
+ int cardinality = docIds.getCardinality();
Review Comment:
`bytes` and `cardinality` are fetched via two separate synchronized calls on
`docIds`, so the logged cardinality can drift from the serialized snapshot if
the bitmap is mutated between calls. Consider capturing both under a single
lock (e.g., `synchronized (docIds) { ... }`) or adding a single method that
returns both snapshot bytes and cardinality atomically.
```suggestion
byte[] bytes;
int cardinality;
synchronized (docIds) {
// Capture snapshot bytes and cardinality from the same bitmap state
bytes = docIds.toBytes();
cardinality = docIds.getCardinality();
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]