qlong opened a new pull request, #16715: URL: https://github.com/apache/iceberg/pull/16715
**Change** This PR is part of the work to support variant extraction pushdown, the core change is to engineschema that now maps slots to paths in variant. - Add SparkVariantExtractionScanBuilder implementing SupportsPushDownVariantExtractions so Spark can push variant_get paths from Filter/Project nodes into Iceberg scans. - Gate behind new **spark.sql.iceberg.variant-extraction-push-down**.enabled (default on). - Use an all-or-nothing batch policy: decline the entire batch if any extraction has an unsupported path, unsupported target type, references a non-variant column, or is a full-variant slot (expectedDataType = VariantType, path $). - Avoid partial scan rewrites that break multi-variant tables and plans where variant_get above join/aggregate barriers still references the original column. - Override readSchema() on batch query scans to expose annotated extraction structs to executors. - Add TestVariantShreddingPushdown for DSv2 plan shape and query correctness. issue: https://github.com/apache/iceberg/issues/16448 **Notes for reviewers** - PathUtil.java is mostly copied from existing PR https://github.com/apache/iceberg/pull/15384, will rebase once that PR is merged. - Requires #16714 for end-to-end shredded column reads, that PR should be merged first. This PR enables the full shredded read feature. **Test** See performance improvements in #16714 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
