github-actions[bot] commented on code in PR #61071:
URL: https://github.com/apache/doris/pull/61071#discussion_r2979000299


##########
be/src/io/cache/mem_file_cache_storage.cpp:
##########
@@ -21,87 +21,59 @@
 #include <mutex>
 #include <system_error>
 
+#include "common/config.h"
 #include "common/logging.h"
 #include "io/cache/block_file_cache.h"
 #include "io/cache/file_block.h"
 #include "io/cache/file_cache_common.h"
+#include "io/cache/file_cache_storage.h"
+#include "io/cache/shard_mem_cache.h"
 #include "runtime/exec_env.h"
 #include "vec/common/hex.h"
 
 namespace doris::io {
 
-MemFileCacheStorage::~MemFileCacheStorage() {}
+MemFileCacheStorage::~MemFileCacheStorage() = default;
 
 Status MemFileCacheStorage::init(BlockFileCache* _mgr) {
     LOG_INFO("init in-memory file cache storage");
     _mgr->_async_open_done = true; // no data to load for memory storage
-    return Status::OK();
-}
 
-Status MemFileCacheStorage::append(const FileCacheKey& key, const Slice& 
value) {
-    std::lock_guard<std::mutex> lock(_cache_map_mtx);
-
-    auto map_key = std::make_pair(key.hash, key.offset);
-    auto iter = _cache_map.find(map_key);
-    if (iter != _cache_map.end()) {
-        // despite the name append, it is indeed a put, so the key should not 
exist
-        LOG_WARNING("key already exists in in-memory cache map")
-                .tag("hash", key.hash.to_string())
-                .tag("offset", key.offset);
-        DCHECK(false);
-        return Status::IOError("key already exists in in-memory cache map");
+    _shard_nums = config::file_cache_mem_storage_shard_num;
+    if ((_shard_nums & _shard_nums - 1) != 0) {

Review Comment:
   **Bug (HIGH):** This power-of-2 check does not reject `_shard_nums == 0`. 
When `_shard_nums` is 0:
   - `0 & (0 - 1)` = `0 & 0xFFFFFFFFFFFFFFFF` = `0`, so the check passes
   - `_shard_mask` = `0 - 1` = `0xFFFFFFFFFFFFFFFF`
   - The `for` loop creates zero shards (empty vector)
   - Any subsequent `get_shard_num()` returns a large index into an empty 
vector → **out-of-bounds access / crash**
   
   Also, since `config::file_cache_mem_storage_shard_num` is `int32_t`, a 
negative value would be implicitly cast to a large `uint64_t`.
   
   Suggested fix:
   ```cpp
   if (_shard_nums == 0 || (_shard_nums & (_shard_nums - 1)) != 0) {
   ```
   
   Additionally, adding explicit parentheses around `(_shard_nums - 1)` 
improves readability and avoids compiler `-Wparentheses` warnings, even though 
the current precedence happens to be correct.



##########
be/src/io/cache/shard_mem_cache.cpp:
##########
@@ -0,0 +1,100 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "io/cache/shard_mem_cache.h"
+
+#include <cstring>
+#include <memory>
+#include <mutex>
+
+#include "common/logging.h"
+
+namespace doris::io {
+
+Status ShardMemHashTable::remove(const FileCacheKey& key) {
+    // find and clear the one in _cache_map
+    std::unique_lock<std::shared_mutex> lock(_shard_mt);
+    auto map_key = std::make_pair(key.hash, key.offset);
+    auto iter = _cache_map.find(map_key);
+    if (iter == _cache_map.end()) {
+        LOG_WARNING("key not found in cache map")
+                .tag("hash", key.hash.to_string())
+                .tag("offset", key.offset);
+        return Status::IOError("key not found in in-memory cache map when 
remove");
+    }
+    _cache_map.erase(iter);
+
+    return Status::OK();
+}
+
+Status ShardMemHashTable::append(const FileCacheKey& key, const Slice& value) {
+    std::unique_lock<std::shared_mutex> lock(_shard_mt);
+
+    auto map_key = std::make_pair(key.hash, key.offset);
+    auto iter = _cache_map.find(map_key);
+    if (iter != _cache_map.end()) {
+        // despite the name append, it is indeed a put, so the key should not 
exist
+        LOG_WARNING("key already exists in in-memory cache map")
+                .tag("hash", key.hash.to_string())
+                .tag("offset", key.offset);
+        DCHECK(false);
+        return Status::IOError("key already exists in in-memory cache map");
+    }
+    // TODO(zhengyu): allocate in mempool
+    auto mem_block =
+            MemBlock {std::shared_ptr<char[]>(new char[value.size], 
std::default_delete<char[]>()),
+                      value.size};
+    DCHECK(mem_block.addr != nullptr);
+    _cache_map[map_key] = mem_block;
+    char* dst = mem_block.addr.get();
+    // TODO(zhengyu): zero copy!
+    memcpy(dst, value.data, value.size);
+
+    return Status::OK();
+}
+
+Status ShardMemHashTable::read(const FileCacheKey& key, size_t value_offset, 
Slice buffer) {
+    std::shared_lock<std::shared_mutex> lock(_shard_mt);
+    auto map_key = std::make_pair(key.hash, key.offset);
+    auto iter = _cache_map.find(map_key);
+    if (iter == _cache_map.end()) {
+        LOG_WARNING("key not found in cache map")
+                .tag("hash", key.hash.to_string())
+                .tag("offset", key.offset);
+        return Status::IOError("key not found in in-memory cache map when 
read");
+    }
+
+    auto mem_block = iter->second;

Review Comment:
   **Nit (LOW):** `auto mem_block = iter->second;` copies the `MemBlock` 
struct, which includes a `shared_ptr<char[]>`. This involves an atomic 
reference count increment/decrement. Since we only need read access under the 
shared lock, this can be avoided:
   ```cpp
   const auto& mem_block = iter->second;
   ```



##########
be/src/io/cache/mem_file_cache_storage.h:
##########
@@ -49,8 +46,17 @@ class MemFileCacheStorage : public FileCacheStorage {
     FileCacheStorageType get_type() override { return MEMORY; }
 
 private:
-    std::unordered_map<FileWriterMapKey, MemBlock, FileWriterMapKeyHash> 
_cache_map;
-    std::mutex _cache_map_mtx;
+    uint64_t get_shard_num(const FileWriterMapKey& key) const {
+        FileWriterMapKeyHash hash_func;
+        std::size_t hash = hash_func(key);
+
+        return hash & _shard_mask;
+    }
+
+    // new code

Review Comment:
   **Nit:** This `// new code` comment appears to be a leftover from 
development. Please remove it — it doesn't convey useful information.



##########
be/src/common/config.h:
##########
@@ -1277,7 +1277,7 @@ 
DECLARE_mInt64(file_cache_background_lru_log_replay_interval_ms);
 DECLARE_mBool(enable_evaluate_shadow_queue_diff);
 
 DECLARE_mBool(file_cache_enable_only_warm_up_idx);
-
+DECLARE_mInt32(file_cache_mem_storage_shard_num);

Review Comment:
   **Issue (MEDIUM):** This config is declared as mutable (`DECLARE_mInt32`), 
meaning it can be changed at runtime. However, it is only read once during 
`MemFileCacheStorage::init()` and the shard count/mask/vector are never 
re-initialized afterwards. A runtime change would be silently ignored, giving 
users a false impression that the change took effect.
   
   Recommendation: Change to `DECLARE_Int32` (immutable) since the value cannot 
take effect without reinitializing the entire cache storage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to