xinyiZzz commented on a change in pull request #8476:
URL: https://github.com/apache/incubator-doris/pull/8476#discussion_r829013533



##########
File path: be/src/exec/tablet_info.cpp
##########
@@ -416,7 +416,7 @@ 
VOlapTablePartitionParam::VOlapTablePartitionParam(std::shared_ptr<OlapTableSche
         : _schema(schema),
           _t_param(t_param),
           _slots(_schema->tuple_desc()->slots()),
-          _mem_tracker(MemTracker::create_tracker(-1, 
"OlapTablePartitionParam")) {
+          _mem_tracker(MemTracker::create_virtual_tracker(-1, 
"OlapTablePartitionParam")) {

Review comment:
       Because this tracker needs to manually consume/release
   The difference between virutal tracker and non-virutal tracker:
   
   - non-virutal tracker
   In order to ensure the absolute accuracy of non-virutal mem tracker tree 
statistics, there are only two ways to count: one is to modify the tls mem 
tracker through attach or switch, and count in the tcmalloc new/delete hook; 
the other is to transfer memory ownership between non-virutal trackers.
   
   -virutal tracker
   Manual consume/release as before, the reasons for designing the virutal 
tracker: First, to transfer memory ownership between two trackers, it will 
release first and then consume, which is slower than calling consume/release 
directly on the virutal tracker; second, through parameters After blocking the 
virutal tracker, it will prevent the mem tracker tree from becoming more messy, 
and it is safer to add or delete the virutal tracker.
   The non-virutal tracker is similar to the INFO log level, and the virutal 
tracker is similar to the DEBUG log level.

##########
File path: be/src/exec/tablet_info.cpp
##########
@@ -416,7 +416,7 @@ 
VOlapTablePartitionParam::VOlapTablePartitionParam(std::shared_ptr<OlapTableSche
         : _schema(schema),
           _t_param(t_param),
           _slots(_schema->tuple_desc()->slots()),
-          _mem_tracker(MemTracker::create_tracker(-1, 
"OlapTablePartitionParam")) {
+          _mem_tracker(MemTracker::create_virtual_tracker(-1, 
"OlapTablePartitionParam")) {

Review comment:
       Because this tracker needs to manually consume/release
   The difference between virutal tracker and non-virutal tracker:
   
   - non-virutal tracker
   In order to ensure the absolute accuracy of non-virutal mem tracker tree 
statistics, there are only two ways to count: one is to modify the tls mem 
tracker through attach or switch, and count in the tcmalloc new/delete hook; 
the other is to transfer memory ownership between non-virutal trackers.
   
   - virutal tracker
   Manual consume/release as before, the reasons for designing the virutal 
tracker: First, to transfer memory ownership between two trackers, it will 
release first and then consume, which is slower than calling consume/release 
directly on the virutal tracker; second, through parameters After blocking the 
virutal tracker, it will prevent the mem tracker tree from becoming more messy, 
and it is safer to add or delete the virutal tracker.
   
   The non-virutal tracker is similar to the INFO log level, and the virutal 
tracker is similar to the DEBUG log level.

##########
File path: be/src/exec/tablet_sink.cpp
##########
@@ -48,6 +49,7 @@ NodeChannel::NodeChannel(OlapTableSink* parent, IndexChannel* 
index_channel, int
     if (_parent->_transfer_data_by_brpc_attachment) {
         _tuple_data_buffer_ptr = &_tuple_data_buffer;
     }
+    _node_channel_tracker = MemTracker::create_tracker(-1, "NodeChannel");

Review comment:
       The BE id of a BE mem tracker is the same = _ =, so I added the thread 
id.

##########
File path: be/src/exprs/agg_fn_evaluator.cpp
##########
@@ -154,7 +154,7 @@ Status AggFnEvaluator::prepare(RuntimeState* state, const 
RowDescriptor& desc, M
     _intermediate_slot_desc = intermediate_slot_desc;
 
     _string_buffer_len = 0;
-    _mem_tracker = mem_tracker;
+    _mem_tracker = MemTracker::create_virtual_tracker(-1, "AggFnEvaluator", 
mem_tracker);

Review comment:
       The `memtracker` param is used in Expr::prepare on the next line, which 
I modified.
   In addition, it is also the parent of the virtual tracker `_mem_tracker`.

##########
File path: be/src/olap/memtable.cpp
##########
@@ -31,14 +31,13 @@ namespace doris {
 
 MemTable::MemTable(int64_t tablet_id, Schema* schema, const TabletSchema* 
tablet_schema,
                    const std::vector<SlotDescriptor*>* slot_descs, 
TupleDescriptor* tuple_desc,
-                   KeysType keys_type, RowsetWriter* rowset_writer,
-                   const std::shared_ptr<MemTracker>& parent_tracker)
+                   KeysType keys_type, RowsetWriter* rowset_writer)
         : _tablet_id(tablet_id),
           _schema(schema),
           _tablet_schema(tablet_schema),
           _slot_descs(slot_descs),
           _keys_type(keys_type),
-          _mem_tracker(MemTracker::create_tracker(-1, "MemTable", 
parent_tracker)),
+          _mem_tracker(MemTracker::create_tracker(-1, "MemTable")),

Review comment:
       The difference between virtual and non-virtual trackers is explained 
above

##########
File path: be/src/olap/rowset/segment_v2/segment.cpp
##########
@@ -51,10 +51,10 @@ Segment::Segment(const FilePathDesc& path_desc, uint32_t 
segment_id,
                  const TabletSchema* tablet_schema)
         : _path_desc(path_desc), _segment_id(segment_id), 
_tablet_schema(tablet_schema) {
 #ifndef BE_TEST
-    _mem_tracker = MemTracker::create_tracker(
+    _mem_tracker = MemTracker::create_virtual_tracker(

Review comment:
       The difference between virtual and non-virtual trackers is explained 
above

##########
File path: be/src/olap/snapshot_manager.h
##########
@@ -99,6 +102,9 @@ class SnapshotManager {
     // snapshot
     Mutex _snapshot_mutex;
     uint64_t _snapshot_base_id;
+
+    // TODO(zxy) used after
+    std::shared_ptr<MemTracker> _mem_tracker = nullptr;

Review comment:
       In the next pr, this tracker will be switched to the tls mem tracker in 
the public func of `SnapshotManager`.
   
   In this pr, I created all the trackers that will be used in the future, and 
built a complete mem tracker tree. (Perhaps it would be better to do this with 
a pr alone... = =||| )

##########
File path: be/src/olap/tablet_manager.cpp
##########
@@ -73,7 +73,7 @@ static bool _cmp_tablet_by_create_time(const TabletSharedPtr& 
a, const TabletSha
 }
 
 TabletManager::TabletManager(int32_t tablet_map_lock_shard_size)
-        : _mem_tracker(MemTracker::create_tracker(-1, "TabletManager", nullptr,
+        : _mem_tracker(MemTracker::create_virtual_tracker(-1, "TabletManager", 
nullptr,

Review comment:
       Because of the cache involved, the memory statistics of `TabletManager` 
were skipped before,
   This has been implemented in the updated code.

##########
File path: be/src/olap/task/engine_alter_tablet_task.cpp
##########
@@ -25,7 +25,14 @@ namespace doris {
 using std::to_string;
 
 EngineAlterTabletTask::EngineAlterTabletTask(const TAlterTabletReqV2& request)
-        : _alter_tablet_req(request) {}
+        : _alter_tablet_req(request) {
+    _mem_tracker = MemTracker::create_tracker(

Review comment:
       Same as above, // TODO(zxy) used after

##########
File path: be/src/olap/snapshot_manager.h
##########
@@ -99,6 +102,9 @@ class SnapshotManager {
     // snapshot
     Mutex _snapshot_mutex;
     uint64_t _snapshot_base_id;
+
+    // TODO(zxy) used after
+    std::shared_ptr<MemTracker> _mem_tracker = nullptr;

Review comment:
       In the next pr, this tracker will be switched to the tls mem tracker in 
the public func of `SnapshotManager`.
   
   In this pr, I created all the trackers that will be used in the future, and 
built a complete mem tracker tree. (Perhaps it would be better to do this with 
a pr alone... Do you think it needs to be deleted first? = =||| )

##########
File path: be/src/runtime/dpp_sink.cpp
##########
@@ -0,0 +1,926 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       no, need to be deleted
   This file was deleted on the master, I didn't pay attention when I rebase

##########
File path: be/src/runtime/row_batch.cpp
##########
@@ -39,8 +40,8 @@ namespace doris {
 const int RowBatch::AT_CAPACITY_MEM_USAGE = 8 * 1024 * 1024;
 const int RowBatch::FIXED_LEN_BUFFER_LIMIT = AT_CAPACITY_MEM_USAGE / 2;
 
-RowBatch::RowBatch(const RowDescriptor& row_desc, int capacity, MemTracker* 
mem_tracker)
-        : _mem_tracker(mem_tracker),
+RowBatch::RowBatch(const RowDescriptor& row_desc, int capacity)
+        : 
_mem_tracker(thread_local_ctx.get()->_thread_mem_tracker_mgr->mem_tracker()),

Review comment:
       I understand that modifying the mem tracker of `rowbatch` is not 
directly related to the thread used.
   
   My understanding of this question is: whether there is a `rowbatch`, consume 
tls A mem tracker when new, release tls B mem tracker when delete, and both 
trackers are not equal to 0 when they are finally destructed.
   
   The actual code now avoids the above problem. Through 
`RowBatch::acquire_state` and `RowBatch::transfer_resource_ownership`, complete 
the mem_tracker update and memory ownership transfer of buffers in two 
`row_batch`, avoiding the new and delete of buffers in a `rowbatch` on 
different trackers.
   
   For example: In `OlapScanNode::get_next`, the `rowbatch` created by Scanner 
will be transferred to the external parameter `row_batch` through 
`RowBatch::acquire_state`, and the mem tracker of the buffer will be modified. 
Ownership is transferred in two row_batches via `update_mem_tracker`.
   
   But I'm not sure if the buffer mem_tracker is updated in all similar places, 
which requires manual maintenance to ensure that the new and delete of a buffer 
are in the same tracker. **I will test further. **
   
   Similarly, `mem pool` also has a similar situation, `mem pool` also provides 
`MemPool::acquire_data` and `MemPool::exchange_data` to complete the transfer 
of chunks. However, I used to add the tls mem tracker when allocate in each 
`chunk`, and found that the `chunk` tracker is different from the tls mem 
tracker when the `mem pool` is destructed.

##########
File path: be/src/runtime/row_batch.cpp
##########
@@ -39,8 +40,8 @@ namespace doris {
 const int RowBatch::AT_CAPACITY_MEM_USAGE = 8 * 1024 * 1024;
 const int RowBatch::FIXED_LEN_BUFFER_LIMIT = AT_CAPACITY_MEM_USAGE / 2;
 
-RowBatch::RowBatch(const RowDescriptor& row_desc, int capacity, MemTracker* 
mem_tracker)
-        : _mem_tracker(mem_tracker),
+RowBatch::RowBatch(const RowDescriptor& row_desc, int capacity)
+        : 
_mem_tracker(thread_local_ctx.get()->_thread_mem_tracker_mgr->mem_tracker()),

Review comment:
       I understand that modifying the mem tracker of `rowbatch` is not 
directly related to the thread used.
   
   My understanding of this question is: whether there is a `rowbatch`, consume 
tls A mem tracker when new, release tls B mem tracker when delete, and both 
trackers are not equal to 0 when they are finally destructed.
   
   The actual code now avoids the above problem. Through 
`RowBatch::acquire_state` and `RowBatch::transfer_resource_ownership`, complete 
the mem_tracker update and memory ownership transfer of buffers in two 
`row_batch`, avoiding the new and delete of buffers in a `rowbatch` on 
different trackers.
   
   For example: In `OlapScanNode::get_next`, the `rowbatch` created by Scanner 
will be transferred to the external parameter `row_batch` through 
`RowBatch::acquire_state`, and the mem tracker of the buffer will be modified. 
Ownership is transferred in two row_batches via `update_mem_tracker`.
   
   But I'm not sure if the buffer mem_tracker is updated in all similar places, 
which requires manual maintenance to ensure that the new and delete of a buffer 
are in the same tracker. ** I will test in further.  **
   
   Similarly, `mem pool` also has a similar situation, `mem pool` also provides 
`MemPool::acquire_data` and `MemPool::exchange_data` to complete the transfer 
of chunks. However, I used to add the tls mem tracker when allocate in each 
`chunk`, and found that the `chunk` tracker is different from the tls mem 
tracker when the `mem pool` is destructed.

##########
File path: be/src/runtime/row_batch.cpp
##########
@@ -39,8 +40,8 @@ namespace doris {
 const int RowBatch::AT_CAPACITY_MEM_USAGE = 8 * 1024 * 1024;
 const int RowBatch::FIXED_LEN_BUFFER_LIMIT = AT_CAPACITY_MEM_USAGE / 2;
 
-RowBatch::RowBatch(const RowDescriptor& row_desc, int capacity, MemTracker* 
mem_tracker)
-        : _mem_tracker(mem_tracker),
+RowBatch::RowBatch(const RowDescriptor& row_desc, int capacity)
+        : 
_mem_tracker(thread_local_ctx.get()->_thread_mem_tracker_mgr->mem_tracker()),

Review comment:
       I understand that modifying the mem tracker of `rowbatch` is not 
directly related to the thread used.
   
   My understanding of this question is: whether there is a `rowbatch`, consume 
tls A mem tracker when new, release tls B mem tracker when delete, and both 
trackers are not equal to 0 when they are finally destructed.
   
   The actual code now avoids the above problem. Through 
`RowBatch::acquire_state` and `RowBatch::transfer_resource_ownership`, complete 
the mem_tracker update and memory ownership transfer of buffers in two 
`row_batch`, avoiding the new and delete of buffers in a `rowbatch` on 
different trackers.
   
   For example: In `OlapScanNode::get_next`, the `rowbatch` created by Scanner 
will be transferred to the external parameter `row_batch` through 
`RowBatch::acquire_state`, and the mem tracker of the buffer will be modified. 
Ownership is transferred in two row_batches via `update_mem_tracker`.
   
   But I'm not sure if the buffer mem_tracker is updated in all similar places, 
which requires manual maintenance to ensure that the new and delete of a buffer 
are in the same tracker. 
   - I will test in further. 
   
   Similarly, `mem pool` also has a similar situation, `mem pool` also provides 
`MemPool::acquire_data` and `MemPool::exchange_data` to complete the transfer 
of chunks. However, I used to add the tls mem tracker when allocate in each 
`chunk`, and found that the `chunk` tracker is different from the tls mem 
tracker when the `mem pool` is destructed.

##########
File path: be/src/runtime/thread_context.h
##########
@@ -50,27 +56,65 @@ class ThreadContext {
     };
 
 public:
-    ThreadContext() : _thread_id(std::this_thread::get_id()), 
_type(TaskType::UNKNOWN) {}
+    ThreadContext() : _thread_id(std::this_thread::get_id()), 
_type(TaskType::UNKNOWN) {
+        _thread_mem_tracker_mgr.reset(new ThreadMemTrackerMgr());
+    }
 
     void attach(const TaskType& type, const std::string& task_id,
-                const TUniqueId& fragment_instance_id) {
+                const TUniqueId& fragment_instance_id,
+                const std::shared_ptr<MemTracker>& mem_tracker) {
         DCHECK(_type == TaskType::UNKNOWN && _task_id == "");
         _type = type;
         _task_id = task_id;
         _fragment_instance_id = fragment_instance_id;
+        _thread_mem_tracker_mgr->attach_task(task_type_string(_type), task_id, 
fragment_instance_id,
+                                             mem_tracker);
     }
 
     void detach() {
         _type = TaskType::UNKNOWN;
         _task_id = "";
         _fragment_instance_id = TUniqueId();
+        _thread_mem_tracker_mgr->detach_task();
     }
 
-    const std::string type() const;
     const std::string& task_id() const { return _task_id; }
     const std::thread::id& thread_id() const { return _thread_id; }
     const TUniqueId& fragment_instance_id() const { return 
_fragment_instance_id; }
 
+    inline static const std::string task_type_string(ThreadContext::TaskType 
type) {
+        switch (type) {
+        case ThreadContext::TaskType::QUERY:
+            return "QUERY";
+        case ThreadContext::TaskType::LOAD:
+            return "LOAD";
+        case ThreadContext::TaskType::COMPACTION:
+            return "COMPACTION";
+        default:
+            return "UNKNOWN";
+        }

Review comment:
       I see, it seems that compilers does compile switch constructs into array 
of code pointers, as said in the link above.
   
   I changed to array as you said, which seems more concise.
   
   Also, I did a simple test in the benchmark and it seems that switch 
constructs are faster, I didn't analyze the assembly code carefully, 
performance is not the bottleneck in this case.
   
   
![image](https://user-images.githubusercontent.com/13197424/158992378-7b821c8f-e528-4cca-b447-7f33a8f425e2.png)
   
![image](https://user-images.githubusercontent.com/13197424/158992899-70740800-6299-471f-a1c3-162fb8ea69ea.png)
   

##########
File path: be/src/runtime/thread_context.h
##########
@@ -50,27 +56,65 @@ class ThreadContext {
     };
 
 public:
-    ThreadContext() : _thread_id(std::this_thread::get_id()), 
_type(TaskType::UNKNOWN) {}
+    ThreadContext() : _thread_id(std::this_thread::get_id()), 
_type(TaskType::UNKNOWN) {
+        _thread_mem_tracker_mgr.reset(new ThreadMemTrackerMgr());
+    }
 
     void attach(const TaskType& type, const std::string& task_id,
-                const TUniqueId& fragment_instance_id) {
+                const TUniqueId& fragment_instance_id,
+                const std::shared_ptr<MemTracker>& mem_tracker) {
         DCHECK(_type == TaskType::UNKNOWN && _task_id == "");
         _type = type;
         _task_id = task_id;
         _fragment_instance_id = fragment_instance_id;
+        _thread_mem_tracker_mgr->attach_task(task_type_string(_type), task_id, 
fragment_instance_id,
+                                             mem_tracker);
     }
 
     void detach() {
         _type = TaskType::UNKNOWN;
         _task_id = "";
         _fragment_instance_id = TUniqueId();
+        _thread_mem_tracker_mgr->detach_task();
     }
 
-    const std::string type() const;
     const std::string& task_id() const { return _task_id; }
     const std::thread::id& thread_id() const { return _thread_id; }
     const TUniqueId& fragment_instance_id() const { return 
_fragment_instance_id; }
 
+    inline static const std::string task_type_string(ThreadContext::TaskType 
type) {
+        switch (type) {
+        case ThreadContext::TaskType::QUERY:
+            return "QUERY";
+        case ThreadContext::TaskType::LOAD:
+            return "LOAD";
+        case ThreadContext::TaskType::COMPACTION:
+            return "COMPACTION";
+        default:
+            return "UNKNOWN";
+        }

Review comment:
       I understand, it seems that compilers does compile switch constructs 
into array of code pointers, as said in the link above.
   
   I changed to array as you said, which seems more concise.
   
   Also, I did a simple test in the benchmark and it seems that switch 
constructs are faster, I didn't analyze the assembly code carefully, 
performance is not the bottleneck in this case.
   
   
![image](https://user-images.githubusercontent.com/13197424/158992378-7b821c8f-e528-4cca-b447-7f33a8f425e2.png)
   
![image](https://user-images.githubusercontent.com/13197424/158992899-70740800-6299-471f-a1c3-162fb8ea69ea.png)
   

##########
File path: be/src/runtime/thread_context.h
##########
@@ -50,27 +56,65 @@ class ThreadContext {
     };
 
 public:
-    ThreadContext() : _thread_id(std::this_thread::get_id()), 
_type(TaskType::UNKNOWN) {}
+    ThreadContext() : _thread_id(std::this_thread::get_id()), 
_type(TaskType::UNKNOWN) {
+        _thread_mem_tracker_mgr.reset(new ThreadMemTrackerMgr());
+    }
 
     void attach(const TaskType& type, const std::string& task_id,
-                const TUniqueId& fragment_instance_id) {
+                const TUniqueId& fragment_instance_id,
+                const std::shared_ptr<MemTracker>& mem_tracker) {
         DCHECK(_type == TaskType::UNKNOWN && _task_id == "");
         _type = type;
         _task_id = task_id;
         _fragment_instance_id = fragment_instance_id;
+        _thread_mem_tracker_mgr->attach_task(task_type_string(_type), task_id, 
fragment_instance_id,
+                                             mem_tracker);
     }
 
     void detach() {
         _type = TaskType::UNKNOWN;
         _task_id = "";
         _fragment_instance_id = TUniqueId();
+        _thread_mem_tracker_mgr->detach_task();
     }
 
-    const std::string type() const;
     const std::string& task_id() const { return _task_id; }
     const std::thread::id& thread_id() const { return _thread_id; }
     const TUniqueId& fragment_instance_id() const { return 
_fragment_instance_id; }
 
+    inline static const std::string task_type_string(ThreadContext::TaskType 
type) {
+        switch (type) {
+        case ThreadContext::TaskType::QUERY:
+            return "QUERY";
+        case ThreadContext::TaskType::LOAD:
+            return "LOAD";
+        case ThreadContext::TaskType::COMPACTION:
+            return "COMPACTION";
+        default:
+            return "UNKNOWN";
+        }

Review comment:
       I understand, it seems that compilers does compile switch constructs 
into array of code pointers, as said in the link above.
   
   I changed to array as you said, which seems more concise. thx~ @dataroaring 
   
   Also, I did a simple test in the benchmark and it seems that switch 
constructs are faster, I didn't analyze the assembly code carefully, 
performance is not the bottleneck in this case.
   
   
![image](https://user-images.githubusercontent.com/13197424/158992378-7b821c8f-e528-4cca-b447-7f33a8f425e2.png)
   
![image](https://user-images.githubusercontent.com/13197424/158992899-70740800-6299-471f-a1c3-162fb8ea69ea.png)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to