deepthi912 commented on code in PR #16344:
URL: https://github.com/apache/pinot/pull/16344#discussion_r2304966124
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/readers/CompactedPinotSegmentRecordReader.java:
##########
@@ -40,60 +42,159 @@ public class CompactedPinotSegmentRecordReader implements
RecordReader {
private final String _deleteRecordColumn;
// Reusable generic row to store the next row to return
private final GenericRow _nextRow = new GenericRow();
- // Valid doc ids iterator
+
+ // Iterator approach for valid document IDs
private PeekableIntIterator _validDocIdsIterator;
+
+ // Index-based approach for sorted valid document IDs
+ private int[] _sortedValidDocIds;
+ private int _currentDocIndex = 0;
+
// Flag to mark whether we need to fetch another row
private boolean _nextRowReturned = true;
public CompactedPinotSegmentRecordReader(RoaringBitmap validDocIds) {
this(validDocIds, null);
}
- public CompactedPinotSegmentRecordReader(RoaringBitmap validDocIds,
- @Nullable String deleteRecordColumn) {
+ public CompactedPinotSegmentRecordReader(RoaringBitmap validDocIds,
@Nullable String deleteRecordColumn) {
_pinotSegmentRecordReader = new PinotSegmentRecordReader();
_validDocIdsBitmap = validDocIds;
_validDocIdsIterator = validDocIds.getIntIterator();
_deleteRecordColumn = deleteRecordColumn;
}
+ public CompactedPinotSegmentRecordReader(ThreadSafeMutableRoaringBitmap
validDocIds) {
+ this(validDocIds, null);
+ }
+
+ public CompactedPinotSegmentRecordReader(ThreadSafeMutableRoaringBitmap
validDocIds,
Review Comment:
QQ, for my understanding... this will compact the current mutable segment
during commit correct? As consuming segments keep invalidating the existing
segments, will those segments remain unaffected with this code change or are we
trying to modify the existing and updated segments as well? If former, this
will only get benifited if the pks are getting invalidated in the same segment
correct?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]