[I] [tiering] Tiering Job Performance: Read-Write Pipeline Optimization [fluss]

via GitHub Mon, 23 Mar 2026 04:38:37 -0700


beryllw opened a new issue, #2915:
URL: https://github.com/apache/fluss/issues/2915


   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and 
found nothing similar.
   
   
   ### Motivation
   
   # Problem Description
   When running a Tiering job with high write throughput, the data 
synchronization cannot keep up with the write speed. The root cause analysis 
reveals two main issues:
   
   1. Parallelism is bounded by bucket count - Tiering job parallelism is 1:1 
mapped to bucket count, limiting scalability
   2. Read and write operations cannot be pipelined - Reading from Fluss and 
writing to Paimon are executed sequentially, resulting in low CPU utilization
   
   # Root Cause Analysis
   
   1. Split Granularity Equals Bucket Granularity: Each split covers exactly 
one bucket, which limits the maximum parallelism.
   2. Sequential Read-Write Pattern: The current implementation reads from 
Fluss and writes to Paimon synchronously.
   
   
   ### Solution
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Willingness to contribute
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [tiering] Tiering Job Performance: Read-Write Pipeline Optimization [fluss]

Reply via email to