Re: [PR] feat: add Spark-compatible arrays_zip function [datafusion]

via GitHub Thu, 25 Jun 2026 18:37:25 -0700


CuteChuanChuan commented on PR #22473:
URL: https://github.com/apache/datafusion/pull/22473#issuecomment-4805663597


   > Thanks @CuteChuanChuan I would expect something like below. Just extend 
existing `arrays_zip_inner` to accept fieldnames and in datafusion-spark, 
create another constructor that accepts field names.
   > 
   > 
   > 
   > Spark caller will create a function with necessary fields.
   > 
   > 
   > 
   > It can be constructor or some `.with_field_names` to avoid recreate 
function
   > 
   > 
   > 
   > 
   > 
   >  ```mermaid
   > 
   >   flowchart TB
   > 
   >       subgraph spark["datafusion-spark"]
   > 
   >           SparkUDF["SparkArraysZip (UDF)<br/>function/array/arrays_zip.rs"]
   > 
   >           SparkNew["new()<br/>names from arg_fields"]
   > 
   >           
SparkCtor["with_field_names(Vec&lt;String&gt;)<br/>caller-supplied names"]
   > 
   >           SparkMod["function/array/mod.rs<br/>+ pub mod arrays_zip<br/>+ 
make_udf_function!<br/>+ export_functions!<br/>+ functions() entry"]
   > 
   > 
   > 
   >           SparkUDF --> SparkNew
   > 
   >           SparkUDF --> SparkCtor
   > 
   >           SparkMod -.registers.-> SparkUDF
   > 
   >       end
   > 
   > 
   > 
   >       subgraph nested["datafusion-functions-nested"]
   > 
   >           ArraysZipUDF["ArraysZip (UDF)<br/>names: '1','2','3'..."]
   > 
   >           Wrapper["arrays_zip_inner(args)<br/>(private 
wrapper)<br/>generates default names"]
   > 
   >           Pub["pub fn arrays_zip_inner_with_names<br/>(args, field_names)"]
   > 
   >           Perfect["try_perfect_list_zip<br/>(args, field_names)"]
   > 
   > 
   > 
   >           ArraysZipUDF --> Wrapper
   > 
   >           Wrapper --> Pub
   > 
   >           Pub --> Perfect
   > 
   >       end
   > 
   > 
   > 
   >       SparkNew -->|resolve names<br/>then call| Pub
   > 
   >       SparkCtor -->|resolve names<br/>then call| Pub
   > 
   >   ```
   > 
   > 
   > 
   >   ## Name resolution inside `SparkArraysZip`
   > 
   > 
   > 
   >   ```mermaid
   > 
   >   flowchart LR
   > 
   >       Start["SparkArraysZip.field_names"]
   > 
   >       Some["Some(names)"]
   > 
   >       None["None"]
   > 
   >       Validate["validate length<br/>matches args"]
   > 
   >       FromArgs["arg_fields[i].name()<br/>for each i"]
   > 
   >       Use["use names"]
   > 
   > 
   > 
   >       Start --> Some
   > 
   >       Start --> None
   > 
   >       Some --> Validate --> Use
   > 
   >       None -->|Spark default:<br/>column/alias names| FromArgs --> Use
   > 
   >   ```
   
   Got it. Thanks for providing these details.❤️ I will give it a try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: add Spark-compatible arrays_zip function [datafusion]

Reply via email to