slessard commented on code in PR #10953: URL: https://github.com/apache/iceberg/pull/10953#discussion_r1776485623
########## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java: ########## @@ -140,12 +141,18 @@ public static class ConstantVectorHolder<T> extends VectorHolder { private final int numRows; public ConstantVectorHolder(int numRows) { + super(new NullVector("_dummy_", numRows), null, new NullabilityHolder(numRows)); + nullabilityHolder().setNulls(0, numRows); this.numRows = numRows; this.constantValue = null; } public ConstantVectorHolder(Types.NestedField icebergField, int numRows, T constantValue) { - super(icebergField); + super( + new NullVector(icebergField.name(), numRows), Review Comment: @nastra OK, Now I see what's going on. The Spark tests were failing before I added a the VectorHolderTest class. I added that class because I realized that there are three different ways to end up with a dummy holder. I ran that test class against the main branch and all the tests passed. When I ran that test class against my branch some of the tests for the isDummy method failed. I then changed the isDummy method to correctly handle the three cases 1. The dummy holder is a DeletedVectorHolderInstance via the deletedVectorHolder() method. 2. The dummy holder is a ConstantVectorHolder created via the dummyHolder() method, i.e. vector has no assigned value 3. the dummy holder is a Constant vector holder created via the constantHolder() method, i.e. vector has an assigned value For DeletedVectorHolder its `vector` instance variable is always `null`. For ConstantVectorHolder, regardless of whether its constant value is `"a"` or `null`, its `vector` instance variable is always a `NullVector` instance. I updated the `isDummy()` method to handle those two cases. This made the three unit tests related to the isDummy method pass. Fixing the isDummy method seems to have fixed the Spark tests. The only remaining issue (possibly minor enough to ignore) is that ConstantVectorHolder, regardless of whether its constant value is `"a"` or `null`, its `vector` instance variable is always a `NullVector` instance. This seems a bit misleading, especially when the constant value is `"a"`. My suggested patch above fixes this issue to make ConstantVectorHolder's `vector` have a `null` value when its constant value is `"a"`. Is this worth fixing? If yes, then my suggested patch is needed. If no, then my suggested patch is not needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org