RussellSpitzer commented on code in PR #10953:
URL: https://github.com/apache/iceberg/pull/10953#discussion_r1803845635


##########
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java:
##########
@@ -140,12 +141,21 @@ public static class ConstantVectorHolder<T> extends 
VectorHolder {
     private final int numRows;
 
     public ConstantVectorHolder(int numRows) {
+      super(new NullVector("_dummy_", numRows), null, new 
NullabilityHolder(numRows));
+      nullabilityHolder().setNulls(0, numRows);
       this.numRows = numRows;
       this.constantValue = null;
     }
 
     public ConstantVectorHolder(Types.NestedField icebergField, int numRows, T 
constantValue) {
-      super(icebergField);
+      super(

Review Comment:
   Ok having gone through the code I think I may have a strategy here but 
please let me know what i'm missing in this interpretation. 
   
   ConstantVectorHolder is used as a dummy holder prior to this commit. It has 
no *vector* , column descriptor, etc ... everything is null. The current 
approach sometimes makes a ConstantVectorHolder which sometimes *does* have 
something inside of it. This is done so that GenericArrowVectorAccessorFactory 
will have something to work with.
   
   From what I can see in GenericArrowVectorAccessorFactory is that it doesn't 
work with ConstantVectorHolders at all. It currently can only handle cases in 
which holder.vector() is a non null type and matches some known vector type. 
Spark on the other hand, does not actually use this path and when it sees a 
ConstantVectorHolder it instead creates a o.a.i.s.ConstantColumnVector. This is 
why I don't think Spark has any issue with this. (cc @amogh-jahagirdar )
   
   Now this makes me think what we really need is to just fix 
GenericArrowVectorAccessorFactory to handle the case where the "vector" is null 
and in that circumstance attempt to cast the VectorHolder to a 
ConstantVectorHolder and create the appropriate vector type or accessor. I 
think we have a few options to do this.
   
   I think the easiest may be to just create an accessor that looks like the 
Spark ConstantColumnVector and just use that as our generic accessor. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to