Re: [PR] Followups on column reader changes [pinot]

via GitHub Tue, 09 Dec 2025 06:04:09 -0800


krishan1390 commented on code in PR #17293:
URL: https://github.com/apache/pinot/pull/17293#discussion_r2602485648



##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/readers/DefaultValueColumnReaderTest.java:
##########
@@ -65,7 +65,7 @@ public void testSingleValueIntColumn() throws IOException {
     int expectedValue = ((Number) fieldSpec.getDefaultNullValue()).intValue();
     for (int i = 0; i < NUM_DOCS; i++) {
       Assert.assertTrue(reader.hasNext());
-      Assert.assertFalse(reader.isNextNull());
+      Assert.assertTrue(reader.isNextNull());
       Assert.assertEquals(reader.nextInt(), expectedValue);

Review Comment:
   >nextInt should only be called when isNextNull is false.
   
   Yes in 
https://github.com/apache/pinot/pull/17304/files#diff-b2f354bd2a7666251988f742a8e710c754933547eaa67fbff28992ed18997705
 I am making the change to throw an exception for nextInt() in 
DefaultValueColumnReader
   
   I am removing this change of returning true from this PR as its handled 
better in that PR 



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/ColumnReader.java:
##########
@@ -275,6 +280,20 @@ Object next()
   String getString(int docId) throws IOException;
   byte[] getBytes(int docId) throws IOException;
 
+  /**
+   * Get the value at the given document ID as a Java Object.
+   * Can be used for both single-value and multi-value columns.
+   * This should be used if
+   * 1. The data type is not known at compile time

Review Comment:
   yea. updated the comment to clarify more



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/readers/PinotSegmentColumnReaderFactory.java:
##########
@@ -91,17 +91,12 @@ public int getNumDocs() {
   }
 
   @Override
-  public ColumnReader getColumnReader(String columnName)
-      throws IOException {
+  public ColumnReader getColumnReader(String columnName) {

Review Comment:
   The use case for the method is to get a column reader from a input file. 
   
   The client needs to know if the input file doesn't contain the column. If we 
throw an IOException in such a case, we can't know if there is an exception in 
creating the reader or if the column doesn't exist. 
   
   Thus returning null can help the client differentiate that. 
   
   Currently PinotSegmentColumnReaderFactory creates a DefaultValueColumnReader 
if it doesn't exist. Ideally it shouldn't do that. The client should handle 
case where the input doesn't have the column (in some cases it might make sense 
to fail or not read that column etc)
   
   I am not making the changes to remove DefaultValueColumnReader from 
PinotSegmentColumnReaderFactory because it might break something if some 
external code depends on that. 
   
   However other factories that we will create will return null if a column 
doesn't exist. So made the change here too to return null. This shouldn't break 
anything major if some external code is using this factory



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/ColumnarDataSource.java:
##########
@@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data.readers;
+
+import java.io.Closeable;
+import java.io.IOException;
+
+/**
+ * <p>
+ * This interface is designed to support the creation of multiple {@link 
ColumnReaderFactory} instances
+ * from the same underlying data source. This capability is required because 
different processing contexts
+ * may require distinct column reader objects for the same input file and 
column. For example:
+ * <ul>
+ *   <li>Different processing stages may need independent iterators over the 
same column</li>
+ *   <li>Parallel processing threads may each require their own column 
readers</li>
+ * </ul>
+ * <p>
+ */
+public interface ColumnarDataSource extends Closeable {

Review Comment:
   does the java doc in the class and createColumnReaderFactory() help in 
answering this ? 
   
   ColumnarDataSource is similar to RecordReaderFileConfig that it holds 
required attributes to initialise the ColumnReaderFactory / ColumnReader
   
   For the same input attributes (file name, etc), we want to create 
independent instances of ColumnReaderFactory / ColumnReader 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Followups on column reader changes [pinot]

Reply via email to