athlcode commented on PR #22100: URL: https://github.com/apache/datafusion/pull/22100#issuecomment-4825260055
> Tests are fine, thanks @athlcode Please help me to understand why we need to keep anything spark related in the core? The mechanism has to live in core, but it isn't Spark-specific. NullHandling::PreserveAndExpandEmpty is just a new option on UnnestOptions in datafusion-common, where UnnestOptions already lives, and UnnestExec in datafusion-physical-plan is the only place the per-row logic can run. Expr::Unnest.outer is a new field on a core Expr variant, and downstream crates can't add fields to a pub enum variant in Rust, so the flag has to live where the variant is defined. Neither references Spark by name. They're the generic mechanism for "unnest that preserves empty arrays as NULL rows," which Hive EXPLODE OUTER, Snowflake FLATTEN(OUTER => true), and BigQuery UNNEST ... WITH OFFSET all need too. The Spark-flavored surface (the explode / explode_outer SQL aliases and the allow_multiple_generators config) is the only part that's truly dialect-specific. Happy to move those into datafusion-spark as a planner extension and a custom config registered via Extensions if you'd prefer the core diff to be purely the mechanism. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
