slessard commented on code in PR #10953:
URL: https://github.com/apache/iceberg/pull/10953#discussion_r1776485623


##########
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java:
##########
@@ -140,12 +141,18 @@ public static class ConstantVectorHolder<T> extends 
VectorHolder {
     private final int numRows;
 
     public ConstantVectorHolder(int numRows) {
+      super(new NullVector("_dummy_", numRows), null, new 
NullabilityHolder(numRows));
+      nullabilityHolder().setNulls(0, numRows);
       this.numRows = numRows;
       this.constantValue = null;
     }
 
     public ConstantVectorHolder(Types.NestedField icebergField, int numRows, T 
constantValue) {
-      super(icebergField);
+      super(
+          new NullVector(icebergField.name(), numRows),

Review Comment:
   @nastra  OK, Now I see what's going on. The Spark tests were failing before 
I added a the VectorHolderTest class. I added that class because I realized 
that there are three different ways to end up with a dummy holder. I ran that 
test class against the main branch and all the tests passed. When I ran that 
test class against my branch some of the tests for the isDummy method failed. I 
then changed the isDummy method to correctly handle the three cases
   
   1.  The dummy holder is a DeletedVectorHolderInstance via the 
deletedVectorHolder() method.
   2. The dummy holder is a ConstantVectorHolder created via the dummyHolder() 
method, i.e. vector has no assigned value
   3. the dummy holder is a Constant vector holder created via the 
constantHolder() method, i.e. vector has an assigned value
   
   For DeletedVectorHolder its `vector` instance variable is always `null`.
   For ConstantVectorHolder, regardless of whether its constant value is `"a"` 
or `null`, its `vector` instance variable is always a `NullVector` instance.
   
   I updated the `isDummy()` method to handle those two cases. This made the 
three unit tests related to the isDummy method pass.
   
   Fixing the isDummy method seems to have fixed the Spark tests.
   
   The only remaining issue (possibly minor enough to ignore) is that 
ConstantVectorHolder, regardless of whether its constant value is `"a"` or 
`null`, its `vector` instance variable is always a `NullVector` instance. This 
seems a bit misleading, especially when the constant value is `"a"`. My 
suggested patch above fixes this issue to make ConstantVectorHolder's `vector` 
have a `null` value when its constant value is `"a"`. Is this worth fixing? If 
yes, then my suggested patch is needed. If no, then my suggested patch is not 
needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to