krishan1390 commented on code in PR #17293:
URL: https://github.com/apache/pinot/pull/17293#discussion_r2602485648
##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/readers/DefaultValueColumnReaderTest.java:
##########
@@ -65,7 +65,7 @@ public void testSingleValueIntColumn() throws IOException {
int expectedValue = ((Number) fieldSpec.getDefaultNullValue()).intValue();
for (int i = 0; i < NUM_DOCS; i++) {
Assert.assertTrue(reader.hasNext());
- Assert.assertFalse(reader.isNextNull());
+ Assert.assertTrue(reader.isNextNull());
Assert.assertEquals(reader.nextInt(), expectedValue);
Review Comment:
>nextInt should only be called when isNextNull is false.
Yes in
https://github.com/apache/pinot/pull/17304/files#diff-b2f354bd2a7666251988f742a8e710c754933547eaa67fbff28992ed18997705
I am making the change to throw an exception for nextInt() in
DefaultValueColumnReader
I am removing this change of returning true from this PR as its handled
better in that PR
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/ColumnReader.java:
##########
@@ -275,6 +280,20 @@ Object next()
String getString(int docId) throws IOException;
byte[] getBytes(int docId) throws IOException;
+ /**
+ * Get the value at the given document ID as a Java Object.
+ * Can be used for both single-value and multi-value columns.
+ * This should be used if
+ * 1. The data type is not known at compile time
Review Comment:
yea. updated the comment to clarify more
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/readers/PinotSegmentColumnReaderFactory.java:
##########
@@ -91,17 +91,12 @@ public int getNumDocs() {
}
@Override
- public ColumnReader getColumnReader(String columnName)
- throws IOException {
+ public ColumnReader getColumnReader(String columnName) {
Review Comment:
The use case for the method is to get a column reader from a input file.
The client needs to know if the input file doesn't contain the column. If we
throw an IOException in such a case, we can't know if there is an exception in
creating the reader or if the column doesn't exist.
Thus returning null can help the client differentiate that.
Currently PinotSegmentColumnReaderFactory creates a DefaultValueColumnReader
if it doesn't exist. Ideally it shouldn't do that. The client should handle
case where the input doesn't have the column (in some cases it might make sense
to fail or not read that column etc)
I am not making the changes to remove DefaultValueColumnReader from
PinotSegmentColumnReaderFactory because it might break something if some
external code depends on that.
However other factories that we will create will return null if a column
doesn't exist. So made the change here too to return null. This shouldn't break
anything major if some external code is using this factory
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/ColumnarDataSource.java:
##########
@@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data.readers;
+
+import java.io.Closeable;
+import java.io.IOException;
+
+/**
+ * <p>
+ * This interface is designed to support the creation of multiple {@link
ColumnReaderFactory} instances
+ * from the same underlying data source. This capability is required because
different processing contexts
+ * may require distinct column reader objects for the same input file and
column. For example:
+ * <ul>
+ * <li>Different processing stages may need independent iterators over the
same column</li>
+ * <li>Parallel processing threads may each require their own column
readers</li>
+ * </ul>
+ * <p>
+ */
+public interface ColumnarDataSource extends Closeable {
Review Comment:
does the java doc in the class and createColumnReaderFactory() help in
answering this ?
ColumnarDataSource is similar to RecordReaderFileConfig that it holds
required attributes to initialise the ColumnReaderFactory / ColumnReader
For the same input attributes (file name, etc), we want to create
independent instances of ColumnReaderFactory / ColumnReader
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]