rohityadav1993 commented on code in PR #16727:
URL: https://github.com/apache/pinot/pull/16727#discussion_r2430151427
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -396,27 +372,58 @@ public void indexColumn(String columnName, @Nullable
int[] sortedDocIds, IndexSe
if (sortedDocIds != null) {
int onDiskDocId = 0;
for (int docId : sortedDocIds) {
- // If validDodIds are provided, only index column if it's a valid doc
- if (validDocIds == null || validDocIds.contains(docId)) {
- indexColumnValue(colReader, creatorsByIndex, columnName,
fieldSpec, dictionaryCreator, docId, onDiskDocId,
- nullVec);
- onDiskDocId++;
- }
+ indexColumnValue(colReader, creatorsByIndex, columnName, fieldSpec,
dictionaryCreator, docId, onDiskDocId,
+ nullVec);
+ onDiskDocId++;
}
} else {
- int onDiskDocId = 0;
for (int docId = 0; docId < numDocs; docId++) {
- // If validDodIds are provided, only index column if it's a valid doc
- if (validDocIds == null || validDocIds.contains(docId)) {
- indexColumnValue(colReader, creatorsByIndex, columnName,
fieldSpec, dictionaryCreator, docId, onDiskDocId,
- nullVec);
- onDiskDocId++;
- }
+ indexColumnValue(colReader, creatorsByIndex, columnName, fieldSpec,
dictionaryCreator, docId, docId, nullVec);
}
}
}
}
+ /**
+ * Index a column using a ColumnReader (column-major approach).
+ * This method processes the column values using the iterator pattern from
ColumnReader.
+ *
+ * @param columnName Name of the column to index
+ * @param columnReader ColumnReader for the column data
+ * @throws IOException if indexing fails
+ */
+ @Override
+ public void indexColumn(String columnName, ColumnReader columnReader) throws
IOException {
+ Map<IndexType<?, ?, ?>, IndexCreator> creatorsByIndex =
_creatorsByColAndIndex.get(columnName);
+ NullValueVectorCreator nullVec =
_nullValueVectorCreatorMap.get(columnName);
+ FieldSpec fieldSpec = _schema.getFieldSpecFor(columnName);
+ SegmentDictionaryCreator dictionaryCreator =
_dictionaryCreatorMap.get(columnName);
+ Object defaultNullValue = fieldSpec.getDefaultNullValue();
+ Object reuseColumnValueToIndex;
+
+ // Reset column reader to start from beginning
+ columnReader.rewind();
+
+ int docId = 0;
+ while (columnReader.hasNext()) {
+ reuseColumnValueToIndex = columnReader.next();
+
+ // Handle null values
+ if (nullVec != null && reuseColumnValueToIndex == null) {
+ nullVec.setNull(docId);
+ reuseColumnValueToIndex = defaultNullValue;
Review Comment:
Good catch, so in current implementation we are doing segment to segment
transformation and with above we will never have a null value to index.
For non Pinot segment datasource, we can have a null value, I previously
removed a util method to do null check on the read value basedon PR comment.
Let me handle case for a read null value with default value when we don't
have null handling enabled. cc: @Jackie-Jiang , @noob-se7en to review as well.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java:
##########
@@ -171,31 +170,72 @@ public void init(SegmentGeneratorConfig config,
RecordReader recordReader)
new TransformPipeline(config.getTableConfig(), config.getSchema()));
}
+ /**
+ * Initialize the driver for columnar segment building using a
ColumnReaderFactory.
+ * This method sets up the driver to use column-wise input data access
instead of row-wise.
+ *
+ * @param config Segment generator configuration
+ * @param columnReaderFactory Factory for creating column readers
+ * @throws Exception if initialization fails
+ */
+ public void init(SegmentGeneratorConfig config, ColumnReaderFactory
columnReaderFactory)
+ throws Exception {
+ // Initialize the column reader factory with target schema
+ columnReaderFactory.init(config.getSchema());
+
+ // Create all column readers for the target schema
+ Map<String, ColumnReader> columnReaders =
columnReaderFactory.getAllColumnReaders();
+
+ // Create columnar data source
+ ColumnarSegmentCreationDataSource columnarDataSource =
+ new ColumnarSegmentCreationDataSource(columnReaderFactory,
columnReaders);
+
+ // Use the existing init method with columnar data source and no transform
pipeline
+ init(config, columnarDataSource, null);
Review Comment:
Yes, we will add transformation support later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]