[PR] Update Spark Parquet vectorized read tests to uses Iceberg Record instead of Avro GenericRecord [iceberg]

via GitHub Mon, 28 Apr 2025 12:17:43 -0700


amogh-jahagirdar opened a new pull request, #12925:
URL: https://github.com/apache/iceberg/pull/12925


   This change updates the Spark Parquet Vectorized read tests to write and 
validate against Iceberg Records instead of Avro generic records. Iceberg 
generic record is the interface that we should be testing against since it 
avoids the intricacies around Avro data types, which end up bubbling through 
when we build expectations currently. Refer to 
https://github.com/apache/iceberg/commit/ab92d6e66b61195a8e9845d3eb592f3e67ae67e1#diff-e649f357cf9965d322086f95bc78451fa1e61e4612f3d8af2114d93a1bd657aa
 for similar changes made for the non-vectorized Parquet reader.
   
   This refactoring is done to prepare for the row lineage changes required for 
the vectorized reader. Since there are a bit more changes involved here, I've 
separated out the test part first so the row lineage changes are easier to 
review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Update Spark Parquet vectorized read tests to uses Iceberg Record instead of Avro GenericRecord [iceberg]

Reply via email to