[I] Support reading tiering source data as Arrow RecordBatch [fluss]

via GitHub Mon, 30 Mar 2026 05:48:40 -0700


luoyuxia opened a new issue, #2962:
URL: https://github.com/apache/fluss/issues/2962


   ## Search before asking
   - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and 
found nothing similar.
   
   ## Description
   This issue tracks the tiering-source part of splitting parent task #437.
   
   Today, tiering source reads Fluss log data and converts it into downstream 
storage formats through row-oriented or storage-specific paths. To support a 
cleaner and more efficient Arrow-based pipeline, tiering source should be able 
to read data directly as Arrow .
   
   This work would provide a reusable Arrow-native read path for tiering, and 
would also serve as the foundation for directly writing tiered data into 
Parquet in a later step.
   
   Possible scope:
   - add a tiering-source path that reads log data as Arrow ;
   - define the batch lifecycle/ownership clearly to avoid Arrow memory leaks;
   - make the Arrow batch path reusable by downstream tiering writers.
   
   This is intended to be one sub-task of #437, while the Arrow-to-Parquet 
conversion itself is tracked separately.
   
   ## Willingness to contribute
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Support reading tiering source data as Arrow RecordBatch [fluss]

Reply via email to