sl255051 commented on PR #10953:
URL: https://github.com/apache/iceberg/pull/10953#issuecomment-2307968000

   Thank you, Eduard, for your help. I will be on vacation next week. I will 
pick this up again when I return on September 3.
   
   -Steve Lessard
   
   
   From: Eduard Tudenhoefner ***@***.***>
   Date: Thursday, August 22, 2024 at 3:46 AM
   To: apache/iceberg ***@***.***>
   Cc: Lessard, Steve ***@***.***>, Mention ***@***.***>
   Subject: [EXTERNAL] Re: [apache/iceberg] DRAFT - Issue 10275 - Reward 
support for nulls (PR #10953)
   [CAUTION: External Email]
   
   
   @nastra commented on this pull request.
   
   @sl255051<https://github.com/sl255051> I spent some time understanding the 
problem and going through the code and fixing it myself (without first looking 
at this PR). I think you're on the right track here to handle the root cause 
and we should have a NullAccessor that internally tracks a NullVector.
   
   I would suggest to squash all the commits to a single one and then go 
through my comments
   
   ________________________________
   
   In 
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java<https://github.com/apache/iceberg/pull/10953#discussion_r1726810395>:
   
   >
   
        @Override
   
        public VectorHolder read(VectorHolder reuse, int numValsToRead) {
   
   -      return VectorHolder.dummyHolder(numValsToRead);
   
   +      NullVector vector = new NullVector(icebergField().name(), 
numValsToRead);
   
   +      return new VectorHolder.NullVectorHolder(vector, icebergField(), 
numValsToRead);
   
   there are places in the codebase that relied on ConstantVectorHolder so I'm 
not sure whether we should introduce a separate NullVectorHolder.
   I tried it with the following diff, which should work too:
   
   --- 
a/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java
   
   +++ 
b/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java
   
   @@ -19,6 +19,7 @@
   
    package org.apache.iceberg.arrow.vectorized;
   
   
   
    import org.apache.arrow.vector.FieldVector;
   
   +import org.apache.arrow.vector.NullVector;
   
    import org.apache.iceberg.MetadataColumns;
   
    import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
   
    import org.apache.iceberg.types.Type;
   
   @@ -138,16 +139,19 @@ public class VectorHolder {
   
      public static class ConstantVectorHolder<T> extends VectorHolder {
   
        private final T constantValue;
   
        private final int numRows;
   
   +    private final FieldVector vector;
   
   
   
        public ConstantVectorHolder(int numRows) {
   
          this.numRows = numRows;
   
          this.constantValue = null;
   
   +      this.vector = new NullVector("_dummy_", numRows);
   
        }
   
   
   
        public ConstantVectorHolder(Types.NestedField icebergField, int 
numRows, T constantValue) {
   
          super(icebergField);
   
          this.numRows = numRows;
   
          this.constantValue = constantValue;
   
   +      this.vector = new NullVector(icebergField.name(), numRows);
   
        }
   
   
   
        @Override
   
   @@ -158,6 +162,11 @@ public class VectorHolder {
   
        public Object getConstant() {
   
          return constantValue;
   
        }
   
   +
   
   +    @Override
   
   +    public FieldVector vector() {
   
   +      return vector;
   
   +    }
   
   ________________________________
   
   In 
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java<https://github.com/apache/iceberg/pull/10953#discussion_r1726812044>:
   
   > @@ -131,6 +131,20 @@ public boolean isDummy() {
   
        return vector == null;
   
      }
   
   
   
   +  public static class NullVectorHolder extends VectorHolder {
   
   see my other comment on whether we should actually introduce this class vs 
modify ConstantVectorHolderto hold a NullVector
   
   ________________________________
   
   In 
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java<https://github.com/apache/iceberg/pull/10953#discussion_r1726814320>:
   
   > @@ -451,10 +452,6 @@ public String toString() {
   
        return columnDescriptor.toString();
   
      }
   
   
   
   -  public static VectorizedArrowReader nulls() {
   
   I don't think we should remove this method. How about rather doing this:
   
   --- 
a/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java
   
   +++ 
b/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java
   
   @@ -455,6 +455,10 @@ public class VectorizedArrowReader implements 
VectorizedReader<VectorHolder> {
   
        return NullVectorReader.INSTANCE;
   
      }
   
   
   
   +  public static VectorizedArrowReader nulls(Types.NestedField icebergField) 
{
   
   +    return new NullVectorReader(icebergField);
   
   +  }
   
   +
   
      public static VectorizedArrowReader positions() {
   
        return new PositionVectorReader(false);
   
      }
   
   @@ -464,11 +468,15 @@ public class VectorizedArrowReader implements 
VectorizedReader<VectorHolder> {
   
      }
   
   
   
      private static final class NullVectorReader extends VectorizedArrowReader 
{
   
   -    private static final NullVectorReader INSTANCE = new NullVectorReader();
   
   +    private static final NullVectorReader INSTANCE = new 
NullVectorReader(null);
   
   +
   
   +    private NullVectorReader(Types.NestedField icebergField) {
   
   +      super(icebergField);
   
   +    }
   
   
   
        @Override
   
        public VectorHolder read(VectorHolder reuse, int numValsToRead) {
   
   -      return VectorHolder.dummyHolder(numValsToRead);
   
   +      return new VectorHolder.ConstantVectorHolder<>(icebergField(), 
numValsToRead, null);
   
        }
   
   ________________________________
   
   In 
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedReaderBuilder.java<https://github.com/apache/iceberg/pull/10953#discussion_r1726816658>:
   
   > @@ -97,7 +98,7 @@ public VectorizedReader<?> message(
   
          } else if (reader != null) {
   
            reorderedFields.add(reader);
   
          } else {
   
   -        reorderedFields.add(VectorizedArrowReader.nulls());
   
   based on my other comments, this should most likely call 
VectorizedArrowReader.nulls(field)
   
   ________________________________
   
   In 
arrow/src/test/java/org/apache/iceberg/arrow/vectorized/ArrowReaderTest.java<https://github.com/apache/iceberg/pull/10953#discussion_r1726817036>:
   
   > @@ -19,7 +19,7 @@
   
    package org.apache.iceberg.arrow.vectorized;
   
   
   
    import static org.apache.iceberg.Files.localInput;
   
   -import static org.assertj.core.api.Assertions.assertThat;
   
   +import static org.assertj.core.api.Assertions.*;
   
   we don't support * imports
   
   ________________________________
   
   In 
arrow/src/test/java/org/apache/iceberg/arrow/vectorized/GenericArrowVectorAccessorFactoryTest.java<https://github.com/apache/iceberg/pull/10953#discussion_r1726819029>:
   
   > +import static org.mockito.Mockito.mock;
   
   +import static org.mockito.Mockito.when;
   
   +
   
   +import java.math.BigDecimal;
   
   +import java.util.function.Supplier;
   
   +import org.apache.arrow.vector.IntVector;
   
   +import org.apache.iceberg.types.Types;
   
   +import org.apache.parquet.column.ColumnDescriptor;
   
   +import org.apache.parquet.schema.PrimitiveType;
   
   +import org.junit.jupiter.api.BeforeEach;
   
   +import org.junit.jupiter.api.Test;
   
   +import org.mockito.InjectMocks;
   
   +import org.mockito.Mock;
   
   +import org.mockito.MockitoAnnotations;
   
   +
   
   +class GenericArrowVectorAccessorFactoryTest {
   
   I'm not sure it's worth adding this test, unless you want to handle all 
vectors (which can also be a follow-up PR)
   
   —
   Reply to this email directly, view it on 
GitHub<https://github.com/apache/iceberg/pull/10953#pullrequestreview-2254160521>,
 or 
unsubscribe<https://github.com/notifications/unsubscribe-auth/BJP6DB7V72VBKNBACG2Z3F3ZSW6P5AVCNFSM6AAAAABMUT7GKSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDENJUGE3DANJSGE>.
   You are receiving this because you were mentioned.Message ID: ***@***.***>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to