gitmodimo opened a new issue, #45434:
URL: https://github.com/apache/arrow/issues/45434

   ### Describe the enhancement requested
   
   I have been experimenting with the concept of pipe in Acero. It show great 
potential in expanding Acero flexibility and features. Initial draft and proof 
of concept is available here: 
   
   Consider `named_pipe` that:
   - has one sink (`pipe_sink` or `pipe_tee`)
   - has multiple sources (`pipe_source`)
   - each `ExecBatch` that enters sink (`pipe_sink`/`pipe_tee`) is duplicated 
to all of its outputs (`pipe_source`)
   - sinks are matched with sources by `pipe_name` in `Init` phase of exec plan
   
   `pipe_sink` - is a sink node that consumes RecordBatches and duplicates them 
onto all `pipe_source`s that match by name.
   `pipe_tee` - works exactly as `pipe_sink` and additionally forwards the 
batch to its output
   `pipe_source` - is pipe consumer node that can be considered as source node 
in terms of declaration
   
   Example of new flexibility given new nodes:
   ```
   Declaration main_query= Declaration::Sequence({
        {"source", ...},
        {"pipe_tee", PipeSinkNodeOptions{"named_pipe_1"}},
        {"filter", FilterNodeOptions{expr1},
        {"sink", ...}
   });
   
   Declaration extra_query= Declaration::Sequence({
        {"pipe_source", PipeSourceNodeOptions{"named_pipe_1", schema}}
        {"filter", FilterNodeOptions{expr2},
        {"sink", ...}
   });
   ...
   main_query.AddToPlan(plan.get());
   extra_query.AddToPlan(plan.get());
   ```
   Things left to do:
   - refactor the code and add `pipe_sink`
   - handle instances of `pipe_sources` consumers that do not have 
corresponding producers - produce error
   - probably more tests 
   - update doc
   
   Let me know what you think about this feature and whether there are any 
pitfalls that I missed. What additional test cases should I implement?
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to