Re: [PR] Followups on column reader changes [pinot]

via GitHub Tue, 09 Dec 2025 03:42:54 -0800


9aman commented on code in PR #17293:
URL: https://github.com/apache/pinot/pull/17293#discussion_r2602191717



##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/readers/DefaultValueColumnReaderTest.java:
##########
@@ -65,7 +65,7 @@ public void testSingleValueIntColumn() throws IOException {
     int expectedValue = ((Number) fieldSpec.getDefaultNullValue()).intValue();
     for (int i = 0; i < NUM_DOCS; i++) {
       Assert.assertTrue(reader.hasNext());
-      Assert.assertFalse(reader.isNextNull());
+      Assert.assertTrue(reader.isNextNull());
       Assert.assertEquals(reader.nextInt(), expectedValue);

Review Comment:
   This change kind of makes these tests depict the wrong access behaviour
   
   nextInt should only be called when isNextNull is false.
   We can still call hasNext and next as it does not guarantee non-null 
returns. 
   
   Confused me a bit while reading. 
   
   Is guess the entire objective to return true in `isNextNull` is to ensure 
that the reader does not go ahead with some default value reads. 
   
   



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/ColumnarDataSource.java:
##########
@@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data.readers;
+
+import java.io.Closeable;
+import java.io.IOException;
+
+/**
+ * <p>
+ * This interface is designed to support the creation of multiple {@link 
ColumnReaderFactory} instances
+ * from the same underlying data source. This capability is required because 
different processing contexts
+ * may require distinct column reader objects for the same input file and 
column. For example:
+ * <ul>
+ *   <li>Different processing stages may need independent iterators over the 
same column</li>
+ *   <li>Parallel processing threads may each require their own column 
readers</li>
+ * </ul>
+ * <p>
+ */
+public interface ColumnarDataSource extends Closeable {

Review Comment:
   Noob question 
   @krishan1390 
   
    Why can't we directly use: `ColumnReaderFactory`
   
   How and where does this additional level of abstraction help ? 
   
   I see an already written columnar reader that is being used during segment 
reload that directly relies on the factory ? 
   
   What value add do we expect from this interface in future ? 



##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/readers/DefaultValueColumnReaderTest.java:
##########
@@ -65,7 +65,7 @@ public void testSingleValueIntColumn() throws IOException {
     int expectedValue = ((Number) fieldSpec.getDefaultNullValue()).intValue();
     for (int i = 0; i < NUM_DOCS; i++) {
       Assert.assertTrue(reader.hasNext());
-      Assert.assertFalse(reader.isNextNull());
+      Assert.assertTrue(reader.isNextNull());
       Assert.assertEquals(reader.nextInt(), expectedValue);

Review Comment:
   My concern here is that hadNext() and next() access pattern for a long  
default column will give 0 while isNextNull() access pattern would expect the 
user to skip the value and handle nulls.
   
   Or do we return null object as the default value ?
   
   `_defaultValue = defaultNullValue;`
   
   even for primitive types ? 



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/ColumnReader.java:
##########
@@ -275,6 +280,20 @@ Object next()
   String getString(int docId) throws IOException;
   byte[] getBytes(int docId) throws IOException;
 
+  /**
+   * Get the value at the given document ID as a Java Object.
+   * Can be used for both single-value and multi-value columns.
+   * This should be used if
+   * 1. The data type is not known at compile time

Review Comment:
   Compile time ?? 



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/readers/PinotSegmentColumnReaderFactory.java:
##########
@@ -91,17 +91,12 @@ public int getNumDocs() {
   }
 
   @Override
-  public ColumnReader getColumnReader(String columnName)
-      throws IOException {
+  public ColumnReader getColumnReader(String columnName) {

Review Comment:
   Why are we moving away from throwing an exception. Is the logic similar to 
that of next() where the caller is expected to return null. 
   
   
   Is so, are there benefits of the same. ? 
   
   In case we are sticking to this, can we just update the interface to say 
that a null will be returned in case the ColumnReader is not present. 
   
     /**
      * Get a column reader for the specified column.
      * Implementations may cache and reuse readers for efficiency.
      *
      * @param columnName Name of the column to read
      * @return ColumnReader instance for the specified column (may be cached)
      */
     @Nullable ColumnReader getColumnReader(String columnName);
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Followups on column reader changes [pinot]

Reply via email to