[GitHub] [doris] compasses opened a new issue, #11640: [Feature] New BulkLoad, which support build segment file isolate from Doris cluster

GitBox Tue, 09 Aug 2022 23:35:35 -0700


compasses opened a new issue, #11640:
URL: https://github.com/apache/doris/issues/11640

### Search before asking

- [X] I had searched in the
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and
found no similar issues.

### Description

Currently, there are several ways to load data into Doris, like broker load,
stream load etc. But all these ways have some kind of shortcomings from our
perspective, for example:

1. High resource cost, because each tablet has multiple replica, and each
replica will do the data load separately.
2. Hard to ensure cluster stable and performance, load jobs may lead to high
load of resource competition.
3. Cause query latency climbing up and down, this kind of issue may came
across again and again.

So we may want a new way to do data load, like lightweight read / write
splitting, which can extremely keep high-throughput write and read.

Here we just have a very rough design, and many details need clarify.

The overall flow：

![image](https://user-images.githubusercontent.com/10161171/183829849-436295b3-8765-4351-91a7-4eb56593a79b.png)

The new bulk load may have some connect with function like backup / restore,
broker load etc.
1. FE issue the bulk load command, and BE will write tablet meta to HDFS.
2. Then FE will schedule a spark / flink job to run segment builder, which
will read HDFS data file and build segment file to local, and upload these
segment file to HDFS when build finish.
3. Then FE will start to load these segment from HDFS, mainly each BE do the
real job.
4. Last the FE need publish this transaction like the broker load.

### Use case

_No response_

### Related issues

_No response_

### Are you willing to submit PR?

- [X] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of
Conduct](https://www.apache.org/foundation/policies/conduct)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

[GitHub] [doris] compasses opened a new issue, #11640: [Feature] New BulkLoad, which support build segment file isolate from Doris cluster

Reply via email to