chasebradford commented on code in PR #7833:
URL: https://github.com/apache/iceberg/pull/7833#discussion_r1232865121


##########
data/src/test/java/org/apache/iceberg/data/TestGenericReaderDeletes.java:
##########
@@ -46,12 +47,13 @@ protected void dropTable(String name) {
 
   @Override
   public StructLikeSet rowSet(String name, Table table, String... columns) 
throws IOException {
-    StructLikeSet set = StructLikeSet.create(table.schema().asStruct());
+    Types.StructType recordSchema = table.schema().select(columns).asStruct();

Review Comment:
   The record that is being produced and passed to the transform operator on 
line 56 only contains the `id` and `data` fields. However, the StructLikeSet is 
being constructed from a schema with 3 fields now that I added the optional 
`bin` field. As a result, the StructLikeSet attempts to compute hashes for all 
3 fields, even though the record that is passed to the transform on line 56 
only contains 2 field values.
   
   I don't fully understand why this isn't already failing in the top-of-tree 
code. The unit test uses this method to select all remaining records after 
creating a delete file, but it only selects the `id` column. What I believe is 
happening now is that even though only `id` is supposed to be in the read 
projection, the `data` column is being added to the internal read schema so 
that the eq delete can be applied. Thus, both fields are available and the 
StructLikeSet doesn't trigger the IndexOutOfBoundsException.
   
   Telling the StructLikeSet to use a schema based on the requested columns 
rather than the table keeps it from accessing fields that aren't supposed to be 
in the record.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to