compasses opened a new issue, #11640:
URL: https://github.com/apache/doris/issues/11640

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Description
   
   Currently, there are several ways to load data into Doris, like broker load, 
stream load etc. But all these ways have some kind of shortcomings from our 
perspective, for example:
   
   1. High resource cost, because each tablet has multiple replica, and each 
replica will do the data load separately.
   2. Hard to ensure cluster stable and performance, load jobs may lead to high 
load of resource competition.
   3. Cause query latency climbing up and down, this kind of issue may came 
across again and again.
   
   So we may want a new way to do data load, like lightweight read / write 
splitting, which can extremely keep high-throughput write and read.
   
   Here we just have a very rough design, and many details need clarify. 
   
   The overall flow:
   
   
![image](https://user-images.githubusercontent.com/10161171/183829849-436295b3-8765-4351-91a7-4eb56593a79b.png)
   
   The new bulk load may have some connect with function like backup / restore, 
broker load etc. 
   1. FE issue the bulk load command, and BE will write tablet meta to HDFS.
   2. Then FE will schedule a spark / flink job to run segment builder, which 
will read HDFS data file and build segment file to local, and upload these 
segment file to HDFS when build finish.
   3. Then FE will start to load these segment from HDFS, mainly each BE do the 
real job.
   4. Last the FE need publish this transaction like the broker load.
   
   
   
   
   ### Use case
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to