HappenLee commented on a change in pull request #3820: URL: https://github.com/apache/incubator-doris/pull/3820#discussion_r439339662
########## File path: be/src/runtime/data_stream_recvr.cc ########## @@ -360,9 +360,14 @@ DataStreamRecvr::DataStreamRecvr( _row_desc(row_desc), _is_merging(is_merging), _num_buffered_bytes(0), - _profile(profile), _sub_plan_query_statistics_recvr(sub_plan_query_statistics_recvr) { - _mem_tracker.reset(new MemTracker(-1, "DataStreamRecvr", parent_tracker)); + _profile.reset(new RuntimeProfile(nullptr, "DataStreamRecvr")); + profile->add_child(_profile.get(), true, nullptr); + + // TODO: Now the parent tracker may cause problem when we need spill to disk, so we Review comment: To be honest, the problem is a little tricky. When the data to be spilled to disk which be read from the underlying ```EXCHANE NODE```. The process is asynchronous. There are two different threads involved 1. ```DataStreamRecv``` will continuously obtain the batch through RPC call and write it to the local queue. In this process, the memory consumption of the batch will be recorded in ```MemTracker```. 2. When we need to spill to disk, ```BlockMgr``` will try to use all the memory to speed up the query. It's going to keep calling ```try_ consume``` to get memory from ```MemTracker```. 3. Once ```try_ consume``` failed. The ```SortNode``` will try to drop the disk. At this time, the memory usage has approached the threshold. (at this time, the memory usage does not exceed the ```MemTracker``` limit) 4. If thread switching occurs at this time, switch from the ```SortNode``` to ```DataStreamRecv```. Because the ```SortNode``` is performing like sort and other CPU time consuming operations, it does not retrieve the data from the ```Exchange Node``` in time. ```DataStreamRecv``` writes data to the queue and records memory consumption. (memory usage exceeds ```MemTracker``` limit at this time) 5. When the thread switches back to ```SortNode```. The check found that ```MemTracker``` exceeded the memory limit, and the query failed. Although the queue in ```DataStreamRecv``` has memory limitation, it only records the size before Protobuffer deserialization. ```MemTracker``` records the size after deserialization. Because Protobuffer compresses the data, there is a gap of three times in my test scenario. At present, My idea is to use this part of memory independently and use a running profile to record the memory usage by ```DataStreamRecv```, so as to avoid this problem temporarily. How can we better solve the problem later? I want to have a further discussion in the community ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org