vagetablechicken commented on issue #2780: OlapTableSink::send is low efficient? URL: https://github.com/apache/incubator-doris/issues/2780#issuecomment-588156273 # OlapTableSink Multithreading Solution Design ## single-thread model (original model)  ## multi-thread model  When we create an OlapTableSink, prepare N buffers and N send threads. If a row needs to be added to NodeChannel(node_id=A, only consider node_id), the main thread copy it to buffer A%N. We can limit the buffer size(use MemTracker), the mem_limit is configurable. If a buffer is full, we block adding rows to this buffer , until all rows in buffer have been consumed. ### Extra cost of multi-thread ver thread: buffer_num mem: buffer_num*(mem_limit+buffer_running_mem) ### single/multi version switch The acceleration effect is evident for large data imports. But no need for every OlapTableSink. So I design it: We can select origin(single-thread) version or multi-thread version by set broker load property, and configure the buffer_num & mem_limit_per_buffer. We add fields to TOlapTableSink, like ``` struct TOlapTableSink { ... 14: optional i64 load_channel_timeout_s // the timeout of load channels in second 15: optional i32 buffer_num 16: optional i64 mem_limit_per_buf 17: optional i64 size_limit_per_buf } ``` `buffer_num > 0` means using "multi-thread & buffer" mode. `buffer_num = 0` or not set means using "single-thread" mode(the origin mode). (`buffer_num = 1` is hard to define. We should avoid setting buffer_num to 1.) So we can use ``` LOAD LABEL ... PROPERTIES ( "buffer_num"="5", "mem_limit_per_buf"="5368709120", "size_limit_per_buf"="62914560" ); ``` # A Test Case cluster: 5 be {"ScannedRows":895737370,"TaskNumber":1,"FileNumber":300,"FileSize":60216822679} // 56G origin ver | LoadStartTime | LoadFinishTime | |----|----| - [ ] TODO multi-thread ver buffer_num/mem_limit_per_buf/size_limit_per_buf = 5/1G/30M | LoadStartTime | LoadFinishTime | |----|----| | 2020-02-18 17:57:38 | 2020-02-18 20:05:13 | buffer_num/mem_limit_per_buf/size_limit_per_buf = 5/5G/30M | LoadStartTime | LoadFinishTime | |----|----| | 2020-02-18 20:14:58 | 2020-02-18 22:19:54 | buffer_num/mem_limit_per_buf/size_limit_per_buf = 5/5G/60M | LoadStartTime | LoadFinishTime | |----|----| | 2020-02-18 22:28:08 | 2020-02-19 00:36:28 | - [ ] TODO test analysis
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org