Re: [PR] feat: Add support for Spark-compatible explode_outer function for arrays [datafusion]

via GitHub Sun, 28 Jun 2026 00:34:01 -0700


athlcode commented on PR #22100:
URL: https://github.com/apache/datafusion/pull/22100#issuecomment-4825260055


   > Tests are fine, thanks @athlcode Please help me to understand why we need 
to keep anything spark related in the core?
   
   The mechanism has to live in core, but it isn't Spark-specific. 
NullHandling::PreserveAndExpandEmpty is just a new option on UnnestOptions in 
datafusion-common, where UnnestOptions already lives, and UnnestExec in 
datafusion-physical-plan is the only place the per-row logic can run. 
Expr::Unnest.outer is a new field on a core Expr variant, and downstream crates 
can't add fields to a pub enum variant in Rust, so the flag has to live where 
the variant is defined.
   
   Neither references Spark by name. They're the generic mechanism for "unnest 
that preserves empty arrays as NULL rows," which Hive EXPLODE OUTER, Snowflake 
FLATTEN(OUTER => true), and BigQuery UNNEST ... WITH OFFSET all need too.
   
   The Spark-flavored surface (the explode / explode_outer SQL aliases and the 
allow_multiple_generators config) is the only part that's truly 
dialect-specific. Happy to move those into datafusion-spark as a planner 
extension and a custom config registered via Extensions if you'd prefer the 
core diff to be purely the mechanism.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Add support for Spark-compatible explode_outer function for arrays [datafusion]

Reply via email to