[doris] branch master updated: [Enhancement](load) Refine the load channel flush policy on mem limit (#12716)

zouxinyi Fri, 23 Sep 2022 19:01:26 -0700

This is an automated email from the ASF dual-hosted git repository.

zouxinyi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git



The following commit(s) were added to refs/heads/master by this push:
     new 3bb920ba54 [Enhancement](load) Refine the load channel flush policy on 
mem limit (#12716)
3bb920ba54 is described below

commit 3bb920ba54236ebafb90fdee34a5453fa5411cfb
Author: zhannngchen <48427519+zhannngc...@users.noreply.github.com>
AuthorDate: Sat Sep 24 10:01:13 2022 +0800

    [Enhancement](load) Refine the load channel flush policy on mem limit 
(#12716)
    
    1. Remove single load channel mem limit, only use load channel mgr mem limit
    2. Default load channel mgr mem limit from 50% to 80%
    3. load channel mgr add soft mem limit. When the soft limit is exceeded, 
other threads will not hang, only current thread triggers flush
    4. When exceed load channel mgr mem limit, find a load channel with the 
largest mem usage, continue to find a tablet channel with the largest mem 
usage, and try to flush 1/3 of the mem usage of this tablet channel.
---
 be/src/common/config.h                           |   7 ++
 be/src/olap/delta_writer.cpp                     |  15 +--
 be/src/runtime/load_channel.h                    |  39 +++---
 be/src/runtime/load_channel_mgr.cpp              |  32 ++---
 be/src/runtime/load_channel_mgr.h                |  93 ++++++++++----
 be/src/runtime/tablets_channel.cpp               | 151 +++++++++++++----------
 be/src/runtime/tablets_channel.h                 |  11 +-
 docs/en/docs/admin-manual/config/be-config.md    |  10 +-
 docs/zh-CN/docs/admin-manual/config/be-config.md |  10 +-
 9 files changed, 228 insertions(+), 140 deletions(-)

diff --git a/be/src/common/config.h b/be/src/common/config.h
index 68ddb20a31..fa93258475 100644
--- a/be/src/common/config.h
+++ b/be/src/common/config.h
@@ -520,6 +520,13 @@ CONF_mInt64(memtable_max_buffer_size, "419430400");
 CONF_Int64(load_process_max_memory_limit_bytes, "107374182400"); // 100GB
 CONF_Int32(load_process_max_memory_limit_percent, "50");         // 50%
 
+// If the memory consumption of load jobs exceed load_process_max_memory_limit,
+// all load jobs will hang there to wait for memtable flush. We should have a
+// soft limit which can trigger the memtable flush for the load channel who
+// consumes lagest memory size before we reach the hard limit. The soft limit
+// might avoid all load jobs hang at the same time.
+CONF_Int32(load_process_soft_mem_limit_percent, "50");
+
 // result buffer cancelled time (unit: second)
 CONF_mInt32(result_buffer_cancelled_interval_time, "300");
 
diff --git a/be/src/olap/delta_writer.cpp b/be/src/olap/delta_writer.cpp
index 6719b9ddf6..9df10f0c40 100644
--- a/be/src/olap/delta_writer.cpp
+++ b/be/src/olap/delta_writer.cpp
@@ -250,14 +250,11 @@ Status DeltaWriter::flush_memtable_and_wait(bool 
need_wait) {
         return Status::OLAPInternalError(OLAP_ERR_ALREADY_CANCELLED);
     }
 
-    if (_flush_token->get_stats().flush_running_count == 0) {
-        // equal means there is no memtable in flush queue, just flush this 
memtable
-        VLOG_NOTICE << "flush memtable to reduce mem consumption. memtable 
size: "
-                    << _mem_table->memory_usage() << ", tablet: " << 
_req.tablet_id
-                    << ", load id: " << print_id(_req.load_id);
-        RETURN_NOT_OK(_flush_memtable_async());
-        _reset_mem_table();
-    }
+    VLOG_NOTICE << "flush memtable to reduce mem consumption. memtable size: "
+                << _mem_table->memory_usage() << ", tablet: " << _req.tablet_id
+                << ", load id: " << print_id(_req.load_id);
+    RETURN_NOT_OK(_flush_memtable_async());
+    _reset_mem_table();
 
     if (need_wait) {
         // wait all memtables in flush queue to be flushed.
@@ -408,7 +405,7 @@ int64_t DeltaWriter::memtable_consumption() const {
     if (_mem_table == nullptr) {
         return 0;
     }
-    return _mem_table->memory_usage();
+    return _mem_table->mem_tracker_hook()->consumption();
 }
 
 int64_t DeltaWriter::partition_id() const {
diff --git a/be/src/runtime/load_channel.h b/be/src/runtime/load_channel.h
index 125cd12fe1..8bc70e5a5e 100644
--- a/be/src/runtime/load_channel.h
+++ b/be/src/runtime/load_channel.h
@@ -62,10 +62,10 @@ public:
 
     // check if this load channel mem consumption exceeds limit.
     // If yes, it will pick a tablets channel to try to reduce memory 
consumption.
-    // If force is true, even if this load channel does not exceeds limit, it 
will still
-    // try to reduce memory.
+    // The mehtod will not return until the chosen tablet channels finished 
memtable
+    // flush.
     template <typename TabletWriterAddResult>
-    Status handle_mem_exceed_limit(bool force, TabletWriterAddResult* 
response);
+    Status handle_mem_exceed_limit(TabletWriterAddResult* response);
 
     int64_t mem_consumption() const { return _mem_tracker->consumption(); }
 
@@ -141,10 +141,7 @@ Status LoadChannel::add_batch(const 
TabletWriterAddRequest& request,
         return st;
     }
 
-    // 2. check if mem consumption exceed limit
-    RETURN_IF_ERROR(handle_mem_exceed_limit(false, response));
-
-    // 3. add batch to tablets channel
+    // 2. add batch to tablets channel
     if constexpr (std::is_same_v<TabletWriterAddRequest, 
PTabletWriterAddBatchRequest>) {
         if (request.has_row_batch()) {
             RETURN_IF_ERROR(channel->add_batch(request, response));
@@ -155,7 +152,7 @@ Status LoadChannel::add_batch(const TabletWriterAddRequest& 
request,
         }
     }
 
-    // 4. handle eos
+    // 3. handle eos
     if (request.has_eos() && request.eos()) {
         st = _handle_eos(channel, request, response);
         if (!st.ok()) {
@@ -174,21 +171,21 @@ inline std::ostream& operator<<(std::ostream& os, const 
LoadChannel& load_channe
 }
 
 template <typename TabletWriterAddResult>
-Status LoadChannel::handle_mem_exceed_limit(bool force, TabletWriterAddResult* 
response) {
-    // lock so that only one thread can check mem limit
-    std::lock_guard<std::mutex> l(_lock);
-    if (!(force || _mem_tracker->limit_exceeded())) {
-        return Status::OK();
-    }
+Status LoadChannel::handle_mem_exceed_limit(TabletWriterAddResult* response) {
+    bool found = false;
+    std::shared_ptr<TabletsChannel> channel;
+    {
+        // lock so that only one thread can check mem limit
+        std::lock_guard<std::mutex> l(_lock);
 
-    if (!force) {
-        LOG(INFO) << "reducing memory of " << *this << " because its mem 
consumption "
-                  << _mem_tracker->consumption() << " has exceeded limit " << 
_mem_tracker->limit();
+        LOG(INFO) << "reducing memory of " << *this
+                  << " ,mem consumption: " << _mem_tracker->consumption();
+        found = _find_largest_consumption_channel(&channel);
     }
-
-    std::shared_ptr<TabletsChannel> channel;
-    if (_find_largest_consumption_channel(&channel)) {
-        return channel->reduce_mem_usage(_mem_tracker->limit(), response);
+    // Release lock so that other threads can still call add_batch 
concurrently.
+    if (found) {
+        DCHECK(channel != nullptr);
+        return channel->reduce_mem_usage(response);
     } else {
         // should not happen, add log to observe
         LOG(WARNING) << "fail to find suitable tablets-channel when memory 
exceed. "
diff --git a/be/src/runtime/load_channel_mgr.cpp 
b/be/src/runtime/load_channel_mgr.cpp
index e45fabd994..b28509ec93 100644
--- a/be/src/runtime/load_channel_mgr.cpp
+++ b/be/src/runtime/load_channel_mgr.cpp
@@ -42,21 +42,6 @@ static int64_t calc_process_max_load_memory(int64_t 
process_mem_limit) {
     return std::min<int64_t>(max_load_memory_bytes, 
config::load_process_max_memory_limit_bytes);
 }
 
-// Calculate the memory limit for a single load channel.
-static int64_t calc_channel_max_load_memory(int64_t load_mem_limit, int64_t 
total_mem_limit) {
-    // default mem limit is used to be compatible with old request.
-    // new request should be set load_mem_limit.
-    constexpr int64_t default_channel_mem_limit = 2 * 1024 * 1024 * 1024L; // 
2GB
-    int64_t channel_mem_limit = default_channel_mem_limit;
-    if (load_mem_limit != -1) {
-        // mem-limit of a certain load should between config::write_buffer_size
-        // and total-memory-limit
-        channel_mem_limit = std::max<int64_t>(load_mem_limit, 
config::write_buffer_size);
-        channel_mem_limit = std::min<int64_t>(channel_mem_limit, 
total_mem_limit);
-    }
-    return channel_mem_limit;
-}
-
 static int64_t calc_channel_timeout_s(int64_t timeout_in_req_s) {
     int64_t load_channel_timeout_s = 
config::streaming_load_rpc_max_alive_time_sec;
     if (timeout_in_req_s > 0) {
@@ -84,6 +69,8 @@ LoadChannelMgr::~LoadChannelMgr() {
 
 Status LoadChannelMgr::init(int64_t process_mem_limit) {
     int64_t load_mgr_mem_limit = 
calc_process_max_load_memory(process_mem_limit);
+    _load_process_soft_mem_limit =
+            load_mgr_mem_limit * config::load_process_soft_mem_limit_percent / 
100;
     _mem_tracker = std::make_shared<MemTrackerLimiter>(load_mgr_mem_limit, 
"LoadChannelMgr");
     REGISTER_HOOK_METRIC(load_channel_mem_consumption,
                          [this]() { return _mem_tracker->consumption(); });
@@ -107,11 +94,9 @@ Status LoadChannelMgr::open(const PTabletWriterOpenRequest& 
params) {
             int64_t channel_timeout_s = 
calc_channel_timeout_s(timeout_in_req_s);
             bool is_high_priority = (params.has_is_high_priority() && 
params.is_high_priority());
 
-            int64_t load_mem_limit = params.has_load_mem_limit() ? 
params.load_mem_limit() : -1;
-            int64_t channel_mem_limit =
-                    calc_channel_max_load_memory(load_mem_limit, 
_mem_tracker->limit());
+            // Use the same mem limit as LoadChannelMgr for a single load 
channel
             auto channel_mem_tracker = std::make_shared<MemTrackerLimiter>(
-                    channel_mem_limit,
+                    _mem_tracker->limit(),
                     fmt::format("LoadChannel#senderIp={}#loadID={}", 
params.sender_ip(),
                                 load_id.to_string()),
                     _mem_tracker);
@@ -126,6 +111,15 @@ Status LoadChannelMgr::open(const 
PTabletWriterOpenRequest& params) {
     return Status::OK();
 }
 
+void LoadChannelMgr::_try_to_wait_flushing() {
+    std::unique_lock<std::mutex> l(_lock);
+    while (_should_wait_flush) {
+        LOG(INFO) << "Reached the load channel manager mem limit " << 
_mem_tracker->limit()
+                  << ", waiting for flush";
+        _wait_flush_cond.wait(l);
+    }
+}
+
 static void dummy_deleter(const CacheKey& key, void* value) {}
 
 void LoadChannelMgr::_finish_load_channel(const UniqueId load_id) {
diff --git a/be/src/runtime/load_channel_mgr.h 
b/be/src/runtime/load_channel_mgr.h
index ebcba803e5..9ff88f9210 100644
--- a/be/src/runtime/load_channel_mgr.h
+++ b/be/src/runtime/load_channel_mgr.h
@@ -68,6 +68,7 @@ private:
     // If yes, it will pick a load channel to try to reduce memory consumption.
     template <typename TabletWriterAddResult>
     Status _handle_mem_exceed_limit(TabletWriterAddResult* response);
+    void _try_to_wait_flushing();
 
     Status _start_bg_worker();
 
@@ -76,10 +77,17 @@ protected:
     std::mutex _lock;
     // load id -> load channel
     std::unordered_map<UniqueId, std::shared_ptr<LoadChannel>> _load_channels;
+    std::shared_ptr<LoadChannel> _reduce_memory_channel = nullptr;
     Cache* _last_success_channel = nullptr;
 
     // check the total load channel mem consumption of this Backend
     std::shared_ptr<MemTrackerLimiter> _mem_tracker;
+    int64_t _load_process_soft_mem_limit = -1;
+
+    // If hard limit reached, one thread will trigger load channel flush,
+    // other threads should wait on the condition variable.
+    bool _should_wait_flush = false;
+    std::condition_variable _wait_flush_cond;
 
     CountDownLatch _stop_background_threads_latch;
     // thread to clean timeout load channels
@@ -143,36 +151,75 @@ Status LoadChannelMgr::add_batch(const 
TabletWriterAddRequest& request,
 
 template <typename TabletWriterAddResult>
 Status LoadChannelMgr::_handle_mem_exceed_limit(TabletWriterAddResult* 
response) {
-    // lock so that only one thread can check mem limit
-    std::lock_guard<std::mutex> l(_lock);
-    if (!_mem_tracker->limit_exceeded()) {
+    _try_to_wait_flushing();
+    // Check the soft limit.
+    DCHECK(_load_process_soft_mem_limit > 0);
+    if (_mem_tracker->consumption() < _load_process_soft_mem_limit) {
         return Status::OK();
     }
-
-    int64_t max_consume = 0;
+    // Pick load channel to reduce memory.
     std::shared_ptr<LoadChannel> channel;
-    for (auto& kv : _load_channels) {
-        if (kv.second->is_high_priority()) {
-            // do not select high priority channel to reduce memory
-            // to avoid blocking them.
-            continue;
+    {
+        std::lock_guard<std::mutex> l(_lock);
+        // Some other thread is flushing data, and not reached hard limit now,
+        // we don't need to handle mem limit in current thread.
+        if (_reduce_memory_channel != nullptr && 
!_mem_tracker->limit_exceeded()) {
+            return Status::OK();
         }
-        if (kv.second->mem_consumption() > max_consume) {
-            max_consume = kv.second->mem_consumption();
-            channel = kv.second;
+
+        // We need to pick a LoadChannel to reduce memory usage.
+        // If `_reduce_memory_channel` is not null, it means the hard limit is
+        // exceed now, we still need to pick a load channel again. Because
+        // `_reduce_memory_channel` might not be the largest consumer now.
+        int64_t max_consume = 0;
+        for (auto& kv : _load_channels) {
+            if (kv.second->is_high_priority()) {
+                // do not select high priority channel to reduce memory
+                // to avoid blocking them.
+                continue;
+            }
+            if (kv.second->mem_consumption() > max_consume) {
+                max_consume = kv.second->mem_consumption();
+                channel = kv.second;
+            }
         }
+        if (max_consume == 0) {
+            // should not happen, add log to observe
+            LOG(WARNING) << "failed to find suitable load channel when total 
load mem limit exceed";
+            return Status::OK();
+        }
+        DCHECK(channel.get() != nullptr);
+        _reduce_memory_channel = channel;
+
+        std::ostringstream oss;
+        oss << "reducing memory of " << *channel << " because total load mem 
consumption "
+            << _mem_tracker->consumption() << " has exceeded";
+        if (_mem_tracker->limit_exceeded()) {
+            _should_wait_flush = true;
+            oss << " hard limit: " << _mem_tracker->limit();
+        } else {
+            oss << " soft limit: " << _load_process_soft_mem_limit;
+        }
+        LOG(INFO) << oss.str();
     }
-    if (max_consume == 0) {
-        // should not happen, add log to observe
-        LOG(WARNING) << "failed to find suitable load channel when total load 
mem limit exceed";
-        return Status::OK();
-    }
-    DCHECK(channel.get() != nullptr);
 
-    // force reduce mem limit of the selected channel
-    LOG(INFO) << "reducing memory of " << *channel << " because total load mem 
consumption "
-              << _mem_tracker->consumption() << " has exceeded limit " << 
_mem_tracker->limit();
-    return channel->handle_mem_exceed_limit(true, response);
+    // No matter soft limit or hard limit reached, only 1 thread will wait 
here,
+    // if hard limit reached, other threads will pend at the beginning of this
+    // method.
+    Status st = channel->handle_mem_exceed_limit(response);
+    LOG(INFO) << "reduce memory of " << *channel << " finished";
+
+    {
+        std::lock_guard<std::mutex> l(_lock);
+        if (_should_wait_flush) {
+            _should_wait_flush = false;
+            _wait_flush_cond.notify_all();
+        }
+        if (_reduce_memory_channel == channel) {
+            _reduce_memory_channel = nullptr;
+        }
+    }
+    return st;
 }
 
 } // namespace doris
diff --git a/be/src/runtime/tablets_channel.cpp 
b/be/src/runtime/tablets_channel.cpp
index 09a912306c..8d8fed46dc 100644
--- a/be/src/runtime/tablets_channel.cpp
+++ b/be/src/runtime/tablets_channel.cpp
@@ -195,78 +195,96 @@ void TabletsChannel::_close_wait(DeltaWriter* writer,
 }
 
 template <typename TabletWriterAddResult>
-Status TabletsChannel::reduce_mem_usage(int64_t mem_limit, 
TabletWriterAddResult* response) {
-    std::lock_guard<std::mutex> l(_lock);
-    if (_state == kFinished) {
-        // TabletsChannel is closed without LoadChannel's lock,
-        // therefore it's possible for reduce_mem_usage() to be called right 
after close()
-        return _close_status;
-    }
-
-    // Sort the DeltaWriters by mem consumption in descend order.
-    std::vector<DeltaWriter*> writers;
-    for (auto& it : _tablet_writers) {
-        it.second->save_memtable_consumption_snapshot();
-        writers.push_back(it.second);
-    }
-    std::sort(writers.begin(), writers.end(), [](const DeltaWriter* lhs, const 
DeltaWriter* rhs) {
-        return lhs->get_memtable_consumption_snapshot() > 
rhs->get_memtable_consumption_snapshot();
-    });
+Status TabletsChannel::reduce_mem_usage(TabletWriterAddResult* response) {
+    _try_to_wait_flushing();
+    std::vector<DeltaWriter*> writers_to_flush;
+    {
+        std::lock_guard<std::mutex> l(_lock);
+        if (_state == kFinished) {
+            // TabletsChannel is closed without LoadChannel's lock,
+            // therefore it's possible for reduce_mem_usage() to be called 
right after close()
+            return _close_status;
+        }
 
-    // Decide which writes should be flushed to reduce mem consumption.
-    // The main idea is to flush at least one third of the mem_limit.
-    // This is mainly to solve the following scenarios.
-    // Suppose there are N tablets in this TabletsChannel, and the mem limit 
is M.
-    // If the data is evenly distributed, when each tablet memory accumulates 
to M/N,
-    // the reduce memory operation will be triggered.
-    // At this time, the value of M/N may be much smaller than the value of 
`write_buffer_size`.
-    // If we flush all the tablets at this time, each tablet will generate a 
lot of small files.
-    // So here we only flush part of the tablet, and the next time the reduce 
memory operation is triggered,
-    // the tablet that has not been flushed before will accumulate more data, 
thereby reducing the number of flushes.
-
-    int64_t mem_to_flushed = mem_limit / 3;
-    int counter = 0;
-    int64_t sum = 0;
-    for (auto writer : writers) {
-        if (writer->memtable_consumption() <= 0) {
-            break;
+        // Sort the DeltaWriters by mem consumption in descend order.
+        std::vector<DeltaWriter*> writers;
+        for (auto& it : _tablet_writers) {
+            it.second->save_memtable_consumption_snapshot();
+            writers.push_back(it.second);
         }
-        ++counter;
-        sum += writer->memtable_consumption();
-        if (sum > mem_to_flushed) {
-            break;
+        std::sort(writers.begin(), writers.end(),
+                  [](const DeltaWriter* lhs, const DeltaWriter* rhs) {
+                      return lhs->get_memtable_consumption_snapshot() >
+                             rhs->get_memtable_consumption_snapshot();
+                  });
+
+        // Decide which writes should be flushed to reduce mem consumption.
+        // The main idea is to flush at least one third of the mem_limit.
+        // This is mainly to solve the following scenarios.
+        // Suppose there are N tablets in this TabletsChannel, and the mem 
limit is M.
+        // If the data is evenly distributed, when each tablet memory 
accumulates to M/N,
+        // the reduce memory operation will be triggered.
+        // At this time, the value of M/N may be much smaller than the value 
of `write_buffer_size`.
+        // If we flush all the tablets at this time, each tablet will generate 
a lot of small files.
+        // So here we only flush part of the tablet, and the next time the 
reduce memory operation is triggered,
+        // the tablet that has not been flushed before will accumulate more 
data, thereby reducing the number of flushes.
+
+        int64_t mem_to_flushed = _mem_tracker->consumption() / 3;
+        int counter = 0;
+        int64_t sum = 0;
+        for (auto writer : writers) {
+            if (writer->mem_consumption() <= 0) {
+                break;
+            }
+            ++counter;
+            sum += writer->mem_consumption();
+            if (sum > mem_to_flushed) {
+                break;
+            }
         }
-    }
-    VLOG_CRITICAL << "flush " << counter << " memtables to reduce memory: " << 
sum;
-    google::protobuf::RepeatedPtrField<PTabletError>* tablet_errors =
-            response->mutable_tablet_errors();
-    for (int i = 0; i < counter; i++) {
-        Status st = writers[i]->flush_memtable_and_wait(false);
-        if (!st.ok()) {
-            auto err_msg = strings::Substitute(
-                    "tablet writer failed to reduce mem consumption by 
flushing memtable, "
-                    "tablet_id=$0, txn_id=$1, err=$2, errcode=$3, msg:$4",
-                    writers[i]->tablet_id(), _txn_id, st.code(), 
st.precise_code(),
-                    st.get_error_msg());
-            LOG(WARNING) << err_msg;
-            PTabletError* error = tablet_errors->Add();
-            error->set_tablet_id(writers[i]->tablet_id());
-            error->set_msg(err_msg);
-            _broken_tablets.insert(writers[i]->tablet_id());
+        VLOG_CRITICAL << "flush " << counter << " memtables to reduce memory: 
" << sum;
+        google::protobuf::RepeatedPtrField<PTabletError>* tablet_errors =
+                response->mutable_tablet_errors();
+        // following loop flush memtable async, we'll do it with _lock
+        for (int i = 0; i < counter; i++) {
+            Status st = writers[i]->flush_memtable_and_wait(false);
+            if (!st.ok()) {
+                auto err_msg = strings::Substitute(
+                        "tablet writer failed to reduce mem consumption by 
flushing memtable, "
+                        "tablet_id=$0, txn_id=$1, err=$2, errcode=$3, msg:$4",
+                        writers[i]->tablet_id(), _txn_id, st.code(), 
st.precise_code(),
+                        st.get_error_msg());
+                LOG(WARNING) << err_msg;
+                PTabletError* error = tablet_errors->Add();
+                error->set_tablet_id(writers[i]->tablet_id());
+                error->set_msg(err_msg);
+                _broken_tablets.insert(writers[i]->tablet_id());
+            }
+        }
+        for (int i = 0; i < counter; i++) {
+            if (_broken_tablets.find(writers[i]->tablet_id()) != 
_broken_tablets.end()) {
+                // skip broken tablets
+                continue;
+            }
+            writers_to_flush.push_back(writers[i]);
         }
+        _reducing_mem_usage = true;
     }
 
-    for (int i = 0; i < counter; i++) {
-        if (_broken_tablets.find(writers[i]->tablet_id()) != 
_broken_tablets.end()) {
-            // skip broken tablets
-            continue;
-        }
-        Status st = writers[i]->wait_flush();
+    for (auto writer : writers_to_flush) {
+        Status st = writer->wait_flush();
         if (!st.ok()) {
             return Status::InternalError(
                     "failed to reduce mem consumption by flushing memtable. 
err: {}", st);
         }
     }
+
+    {
+        std::lock_guard<std::mutex> l(_lock);
+        _reducing_mem_usage = false;
+        _reduce_memory_cond.notify_all();
+    }
+
     return Status::OK();
 }
 
@@ -316,6 +334,13 @@ Status TabletsChannel::_open_all_writers(const 
PTabletWriterOpenRequest& request
     return Status::OK();
 }
 
+void TabletsChannel::_try_to_wait_flushing() {
+    std::unique_lock<std::mutex> l(_lock);
+    while (_reducing_mem_usage) {
+        _reduce_memory_cond.wait(l);
+    }
+}
+
 Status TabletsChannel::cancel() {
     std::lock_guard<std::mutex> l(_lock);
     if (_state == kFinished) {
@@ -422,7 +447,7 @@ template Status
 TabletsChannel::add_batch<PTabletWriterAddBlockRequest, 
PTabletWriterAddBlockResult>(
         PTabletWriterAddBlockRequest const&, PTabletWriterAddBlockResult*);
 template Status TabletsChannel::reduce_mem_usage<PTabletWriterAddBatchResult>(
-        int64_t, PTabletWriterAddBatchResult*);
+        PTabletWriterAddBatchResult*);
 template Status TabletsChannel::reduce_mem_usage<PTabletWriterAddBlockResult>(
-        int64_t, PTabletWriterAddBlockResult*);
+        PTabletWriterAddBlockResult*);
 } // namespace doris
diff --git a/be/src/runtime/tablets_channel.h b/be/src/runtime/tablets_channel.h
index dc54740b09..e11e314e6d 100644
--- a/be/src/runtime/tablets_channel.h
+++ b/be/src/runtime/tablets_channel.h
@@ -30,6 +30,7 @@
 #include "runtime/memory/mem_tracker.h"
 #include "runtime/thread_context.h"
 #include "util/bitmap.h"
+#include "util/countdown_latch.h"
 #include "util/priority_thread_pool.hpp"
 #include "util/uid_util.h"
 #include "vec/core/block.h"
@@ -93,7 +94,7 @@ public:
     // return Status::OK if mem is reduced.
     // no-op when this channel has been closed or cancelled
     template <typename TabletWriterAddResult>
-    Status reduce_mem_usage(int64_t mem_limit, TabletWriterAddResult* 
response);
+    Status reduce_mem_usage(TabletWriterAddResult* response);
 
     int64_t mem_consumption() const { return _mem_tracker->consumption(); }
 
@@ -104,6 +105,8 @@ private:
     // open all writer
     Status _open_all_writers(const PTabletWriterOpenRequest& request);
 
+    void _try_to_wait_flushing();
+
     // deal with DeltaWriter close_wait(), add tablet to list for return.
     void _close_wait(DeltaWriter* writer,
                      google::protobuf::RepeatedPtrField<PTabletInfo>* 
tablet_vec,
@@ -147,6 +150,12 @@ private:
     // So that following batch will not handle this tablet anymore.
     std::unordered_set<int64_t> _broken_tablets;
 
+    bool _reducing_mem_usage = false;
+    // only one thread can reduce memory for one TabletsChannel.
+    // if some other thread call `reduce_memory_usage` at the same time,
+    // it will wait on this condition variable.
+    std::condition_variable _reduce_memory_cond;
+
     std::unordered_set<int64_t> _partition_ids;
 
     static std::atomic<uint64_t> _s_tablet_writer_count;
diff --git a/docs/en/docs/admin-manual/config/be-config.md 
b/docs/en/docs/admin-manual/config/be-config.md
index 1e9ea87bb8..43edeec401 100644
--- a/docs/en/docs/admin-manual/config/be-config.md
+++ b/docs/en/docs/admin-manual/config/be-config.md
@@ -706,12 +706,18 @@ Set these default values very large, because we don't 
want to affect load perfor
 
 ### `load_process_max_memory_limit_percent`
 
-Default: 80 (%)
+Default: 50 (%)
 
-The percentage of the upper memory limit occupied by all imported threads on a 
single node, the default is 80%
+The percentage of the upper memory limit occupied by all imported threads on a 
single node, the default is 50%
 
 Set these default values very large, because we don't want to affect load 
performance when users upgrade Doris. If necessary, the user should set these 
configurations correctly
 
+### `load_process_soft_mem_limit_percent`
+
+Default: 50 (%)
+
+The soft limit refers to the proportion of the load memory limit of a single 
node. For example, the load memory limit for all load tasks is 20GB, and the 
soft limit defaults to 50% of this value, that is, 10GB. When the load memory 
usage exceeds the soft limit, the job with the largest memory consuption will 
be selected to be flushed to release the memory space, the default is 50%
+
 ### `log_buffer_level`
 
 Default: empty
diff --git a/docs/zh-CN/docs/admin-manual/config/be-config.md 
b/docs/zh-CN/docs/admin-manual/config/be-config.md
index a58bf80395..b0f9af9999 100644
--- a/docs/zh-CN/docs/admin-manual/config/be-config.md
+++ b/docs/zh-CN/docs/admin-manual/config/be-config.md
@@ -707,12 +707,18 @@ load错误日志将在此时间后删除
 
 ### `load_process_max_memory_limit_percent`
 
-默认值：80
+默认值：50
 
-单节点上所有的导入线程占据的内存上限比例，默认80%
+单节点上所有的导入线程占据的内存上限比例，默认50%
 
 将这些默认值设置得很大，因为我们不想在用户升级 Doris 时影响负载性能。 如有必要，用户应正确设置这些配置。
 
+### `load_process_soft_mem_limit_percent`
+
+默认值：50
+
+soft limit是指站单节点导入内存上限的比例。例如所有导入任务导入的内存上限是20GB，则soft 
limit默认为该值的50%，即10GB。导入内存占用超过soft limit时，会挑选占用内存最大的作业进行下刷以提前释放内存空间，默认50%
+
 ### `log_buffer_level`
 
 默认值：空


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

[doris] branch master updated: [Enhancement](load) Refine the load channel flush policy on mem limit (#12716)

Reply via email to