Re: [PR] [feature](filecache) persist table/partition context for cache meta for external table [doris]

via GitHub Fri, 20 Mar 2026 16:24:33 -0700


morningman commented on code in PR #61518:
URL: https://github.com/apache/doris/pull/61518#discussion_r2968354256



##########
be/src/vec/exec/scan/olap_scanner.cpp:
##########
@@ -414,6 +398,16 @@ Status OlapScanner::_init_tablet_reader_params(
 
     _tablet_reader_params.profile = _local_state->custom_profile();
     _tablet_reader_params.runtime_state = _state;
+    {
+        auto* olap_scan_local_state = 
(pipeline::OlapScanLocalState*)_local_state;

Review Comment:
   ```suggestion
           auto* olap_scan_local_state = 
static_cast<pipeline::OlapScanLocalState*>(_local_state);
   ```



##########
gensrc/thrift/PlanNodes.thrift:
##########


Review Comment:
   `table_name` can be put in `scan node` level? Not `file range` level.
   Because these will be so many split(file range)



##########
be/src/io/cache/fs_file_cache_storage.cpp:
##########
@@ -884,6 +850,12 @@ void 
FSFileCacheStorage::load_cache_info_into_memory_from_db(BlockFileCache* mgr
         ctx.expiration_time = meta_value.ttl;
         ctx.tablet_id =
                 meta_key.tablet_id; //TODO(zhengyu): zero if loaded from v2, 
we can use this to decide whether the block is loaded from v2 or v3
+        if (meta_value.context_id != 0) {
+            if (auto context = 
_meta_store->get_context(meta_value.context_id); context) {

Review Comment:
   In `load_cache_info_into_memory_from_db`, each block calls 
`_meta_store->get_context(xxx)`, resulting in many redundant lookup for the 
same context_id.
   
   I suggest we can use a map to cache it:
   ```
   std::unordered_map<uint64_t, std::pair<std::string, std::string>> 
context_cache;
   // ...
   if (meta_value.context_id != 0) {
       auto it = context_cache.find(meta_value.context_id);
       if (it != context_cache.end()) {
           ctx.table_name = it->second.first;
           ctx.partition_name = it->second.second;
       } else if (auto context = 
_meta_store->get_context(meta_value.context_id); context) {
           ctx.table_name = context->first;
           ctx.partition_name = context->second;
           context_cache.emplace(meta_value.context_id, *context);
       }
   }```



##########
be/src/io/cache/cache_block_meta_store.cpp:
##########
@@ -332,41 +420,86 @@ std::unique_ptr<BlockMetaIterator> 
CacheBlockMetaStore::get_all() {
     return std::unique_ptr<BlockMetaIterator>(new RocksDBIterator(iter));
 }
 
-size_t CacheBlockMetaStore::approximate_entry_count() const {
-    if (!_db) {
-        LOG(WARNING) << "Database not initialized when counting entries";
+void CacheBlockMetaStore::delete_key(const BlockMetaKey& key) {
+    std::string key_str = serialize_key(key);
+
+    // Put delete task into queue for asynchronous processing
+    WriteOperation op;
+    op.type = OperationType::DELETE;
+    op.key = key_str;
+    _write_queue.enqueue(op);
+}
+
+uint64_t CacheBlockMetaStore::get_or_create_context_id(std::string_view 
table_name,
+                                                       std::string_view 
partition_name) {
+    if ((table_name.empty() && partition_name.empty()) || !_db || 
!_context_dict_cf_handle) {
         return 0;
     }
 
-    rocksdb::ReadOptions read_options;
-    std::unique_ptr<rocksdb::Iterator> iter(
-            _db->NewIterator(read_options, _file_cache_meta_cf_handle.get()));
-    if (!iter) {
-        LOG(WARNING) << "Failed to create iterator when counting entries";
-        return 0;
+    const std::string lookup_key = _build_context_key(table_name, 
partition_name);
+    std::string value;
+    rocksdb::Status status =
+            _db->Get(rocksdb::ReadOptions(), _context_dict_cf_handle.get(), 
lookup_key, &value);
+    if (status.ok()) {
+        auto context_id = parse_context_lookup_value(value);
+        if (context_id.has_value()) {
+            return *context_id;
+        }
+        LOG(WARNING) << "Invalid context lookup value for key";
+    } else if (!status.IsNotFound()) {
+        LOG(WARNING) << "Failed to get context id from rocksdb: " << 
status.ToString();
     }
 
-    size_t count = 0;
-    for (iter->SeekToFirst(); iter->Valid(); iter->Next()) {
-        ++count;
+    std::lock_guard<std::mutex> lock(_context_mutex);
+
+    value.clear();
+    status = _db->Get(rocksdb::ReadOptions(), _context_dict_cf_handle.get(), 
lookup_key, &value);
+    if (status.ok()) {
+        auto context_id = parse_context_lookup_value(value);
+        if (context_id.has_value()) {
+            return *context_id;
+        }
+        LOG(WARNING) << "Invalid context lookup value after retry for key";
+        return 0;
+    }
+    if (!status.IsNotFound()) {
+        LOG(WARNING) << "Failed to get context id from rocksdb after retry: " 
<< status.ToString();
+        return 0;
     }
 
-    if (!iter->status().ok()) {
-        LOG(WARNING) << "Iterator encountered error when counting entries: "
-                     << iter->status().ToString();
+    const uint64_t context_id = _next_context_id.fetch_add(1, 
std::memory_order_acq_rel);
+
+    rocksdb::WriteBatch batch;
+    batch.Put(_context_dict_cf_handle.get(), lookup_key, 
build_context_lookup_value(context_id));
+    batch.Put(_context_dict_cf_handle.get(), build_context_id_key(context_id),
+              _build_context_value(table_name, partition_name));
+
+    status = _db->Write(rocksdb::WriteOptions(), &batch);
+    if (!status.ok()) {
+        LOG(WARNING) << "Failed to create context id in rocksdb: " << 
status.ToString();
+        _next_context_id.store(context_id, std::memory_order_release);

Review Comment:
   `_next_context_id` Rollback Risk.
   
   Problem: If thread A does `fetch_add` and gets id=5, then thread B does 
`fetch_add` and gets id=6 and successfully writes, and then thread A's write 
fails and executes store(5), `_next_context_id` is rolled back to 5. The next 
allocation will yield 5 again, causing an id collision.
   
   Suggestion: Do not roll back `_next_context_id`. Wasting a single id is 
perfectly acceptable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [feature](filecache) persist table/partition context for cache meta for external table [doris]

Reply via email to