9aman commented on code in PR #17293:
URL: https://github.com/apache/pinot/pull/17293#discussion_r2602191717
##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/readers/DefaultValueColumnReaderTest.java:
##########
@@ -65,7 +65,7 @@ public void testSingleValueIntColumn() throws IOException {
int expectedValue = ((Number) fieldSpec.getDefaultNullValue()).intValue();
for (int i = 0; i < NUM_DOCS; i++) {
Assert.assertTrue(reader.hasNext());
- Assert.assertFalse(reader.isNextNull());
+ Assert.assertTrue(reader.isNextNull());
Assert.assertEquals(reader.nextInt(), expectedValue);
Review Comment:
This change kind of makes these tests depict the wrong access behaviour
nextInt should only be called when isNextNull is false.
We can still call hasNext and next as it does not guarantee non-null
returns.
Confused me a bit while reading.
Is guess the entire objective to return true in `isNextNull` is to ensure
that the reader does not go ahead with some default value reads.
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/ColumnarDataSource.java:
##########
@@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data.readers;
+
+import java.io.Closeable;
+import java.io.IOException;
+
+/**
+ * <p>
+ * This interface is designed to support the creation of multiple {@link
ColumnReaderFactory} instances
+ * from the same underlying data source. This capability is required because
different processing contexts
+ * may require distinct column reader objects for the same input file and
column. For example:
+ * <ul>
+ * <li>Different processing stages may need independent iterators over the
same column</li>
+ * <li>Parallel processing threads may each require their own column
readers</li>
+ * </ul>
+ * <p>
+ */
+public interface ColumnarDataSource extends Closeable {
Review Comment:
Noob question
@krishan1390
Why can't we directly use: `ColumnReaderFactory`
How and where does this additional level of abstraction help ?
I see an already written columnar reader that is being used during segment
reload that directly relies on the factory ?
What value add do we expect from this interface in future ?
##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/readers/DefaultValueColumnReaderTest.java:
##########
@@ -65,7 +65,7 @@ public void testSingleValueIntColumn() throws IOException {
int expectedValue = ((Number) fieldSpec.getDefaultNullValue()).intValue();
for (int i = 0; i < NUM_DOCS; i++) {
Assert.assertTrue(reader.hasNext());
- Assert.assertFalse(reader.isNextNull());
+ Assert.assertTrue(reader.isNextNull());
Assert.assertEquals(reader.nextInt(), expectedValue);
Review Comment:
My concern here is that hadNext() and next() access pattern for a long
default column will give 0 while isNextNull() access pattern would expect the
user to skip the value and handle nulls.
Or do we return null object as the default value ?
`_defaultValue = defaultNullValue;`
even for primitive types ?
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/ColumnReader.java:
##########
@@ -275,6 +280,20 @@ Object next()
String getString(int docId) throws IOException;
byte[] getBytes(int docId) throws IOException;
+ /**
+ * Get the value at the given document ID as a Java Object.
+ * Can be used for both single-value and multi-value columns.
+ * This should be used if
+ * 1. The data type is not known at compile time
Review Comment:
Compile time ??
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/readers/PinotSegmentColumnReaderFactory.java:
##########
@@ -91,17 +91,12 @@ public int getNumDocs() {
}
@Override
- public ColumnReader getColumnReader(String columnName)
- throws IOException {
+ public ColumnReader getColumnReader(String columnName) {
Review Comment:
Why are we moving away from throwing an exception. Is the logic similar to
that of next() where the caller is expected to return null.
Is so, are there benefits of the same. ?
In case we are sticking to this, can we just update the interface to say
that a null will be returned in case the ColumnReader is not present.
/**
* Get a column reader for the specified column.
* Implementations may cache and reuse readers for efficiency.
*
* @param columnName Name of the column to read
* @return ColumnReader instance for the specified column (may be cached)
*/
@Nullable ColumnReader getColumnReader(String columnName);
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]