[PR] Spark: Implement variant extraction pushdown for shredded VARIANT columns [iceberg]

via GitHub Sun, 07 Jun 2026 17:30:52 -0700


qlong opened a new pull request, #16715:
URL: https://github.com/apache/iceberg/pull/16715


   **Change**
   
   This PR is part of the work to support variant extraction pushdown, the core 
change is to engineschema that now maps slots to paths in variant. 
   
   - Add SparkVariantExtractionScanBuilder implementing 
SupportsPushDownVariantExtractions so Spark can push variant_get paths from 
Filter/Project nodes into Iceberg scans.
   - Gate behind new **spark.sql.iceberg.variant-extraction-push-down**.enabled 
(default on).
   - Use an all-or-nothing batch policy: decline the entire batch if any 
extraction has an unsupported path, unsupported target type, references a 
non-variant column, or is a full-variant slot (expectedDataType = VariantType, 
path $).
   - Avoid partial scan rewrites that break multi-variant tables and plans 
where variant_get above join/aggregate barriers still references the original 
column.
   - Override readSchema() on batch query scans to expose annotated extraction 
structs to executors.
   - Add TestVariantShreddingPushdown for DSv2 plan shape and query correctness.
   
   issue: https://github.com/apache/iceberg/issues/16448
   
   **Notes for reviewers**
   - PathUtil.java is mostly copied from existing PR 
https://github.com/apache/iceberg/pull/15384, will rebase once that PR is 
merged.
   - Requires #16714  for end-to-end shredded column reads, that PR should be 
merged first. This PR enables the full shredded read feature. 
   
   **Test**
   
   See performance improvements in  #16714 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Spark: Implement variant extraction pushdown for shredded VARIANT columns [iceberg]

Reply via email to