(doris-website) branch master updated: [Doc](exec) Support condition cache doc (#2963)

lihaopeng Sat, 18 Oct 2025 10:42:04 -0700

This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new b51e1797ed0 [Doc](exec) Support condition cache doc (#2963)
b51e1797ed0 is described below

commit b51e1797ed0685074a4bd418c91fa30dc3dad27f
Author: HappenLee <[email protected]>
AuthorDate: Mon Oct 13 11:33:36 2025 +0800

    [Doc](exec) Support condition cache doc (#2963)
    
    ## Versions
    
    - [x] dev
    - [ ] 3.0
    - [ ] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 docs/query-acceleration/condition-cache.md         | 109 +++++++++++++++++++++
 .../current/query-acceleration/condition-cache.md  | 103 +++++++++++++++++++
 sidebars.json                                      |   3 +-
 3 files changed, 214 insertions(+), 1 deletion(-)

diff --git a/docs/query-acceleration/condition-cache.md 
b/docs/query-acceleration/condition-cache.md
new file mode 100644
index 00000000000..6066ab9dc20
--- /dev/null
+++ b/docs/query-acceleration/condition-cache.md
@@ -0,0 +1,109 @@
+# Condition Cache
+
+## Introduction
+
+In large-scale analytical workloads, queries often include **repeated 
filtering conditions (Conditions)**, for example:
+
+```
+SELECT * FROM orders WHERE region = 'ASIA';
+SELECT count(*) FROM orders WHERE region = 'ASIA';
+```
+
+Such queries repeatedly execute the same filtering logic on identical data 
segments, leading to **redundant CPU and I/O overhead**.
+
+To address this, **Apache Doris introduces the Condition Cache mechanism**.
+ It caches the filtering results of specific conditions on a given segment, 
allowing subsequent queries to **reuse those results directly**, thereby 
**reducing unnecessary scans and filtering operations** and significantly 
lowering query latency.
+
+## Working Principle
+
+The core concept of the Condition Cache is:
+
+- **The same filtering condition produces the same result on the same data 
segment.**
+- Doris generates a **64-bit digest** from the combination of “condition 
expression + key range,” which serves as a unique cache identifier.
+- Each segment can then look up existing filtering results in the cache using 
this digest.
+
+Cached results are stored as compressed **bit vectors (`std::vector<bool>`)**:
+
+- **0** indicates that the row range does not meet the condition and can be 
skipped directly;
+- **1** indicates that the range may contain matching data and needs further 
scanning.
+
+Through this mechanism, Doris can quickly eliminate irrelevant data blocks at 
a coarse granularity, performing fine-grained filtering only when necessary.
+
+## Applicable Scenarios
+
+Condition Cache is most effective in the following cases:
+
+- **Repeated conditions**: Identical or similar filter conditions are 
frequently used.
+- **Relatively stable data**: Data inside a segment is typically immutable 
(new segments are generated after INSERT/Compaction, naturally invalidating old 
caches).
+- **High selectivity**: When filters leave only a small subset of rows, it 
maximizes scan reduction.
+
+Condition Cache will **not** be used in the following situations:
+
+- Queries containing **delete predicates** (to ensure correctness, caching is 
disabled).
+- **TopN runtime filters** generated at runtime (currently unsupported).
+
+## Configuration and Management
+
+### Enable or Disable
+
+```
+SET enable_condition_cache = true;
+```
+
+### Memory Management
+
+- Condition Cache uses an **LRU policy** for cache eviction.
+- When exceeding `condition_cache_limit`, the least recently used entries are 
automatically cleared.
+
+You can modify the memory limit in `be.conf`:
+
+```
+condition_cache_limit = 1024  # Unit: MB
+```
+
+- After segment compaction, old cache entries are naturally invalidated 
through LRU eviction.
+
+## Cache Statistics
+
+Doris provides comprehensive metrics to help users monitor the effectiveness 
of Condition Cache:
+
+- **Profile-level metrics** (visible in query execution plans)
+  - `ConditionCacheSegmentHit`: Number of segments that hit the cache
+  - `ConditionCacheFilteredRows`: Number of rows skipped directly by cached 
results
+- **System metrics** (viewable via the monitoring system or `/metrics`)
+  - `condition_cache_search_count`: Total cache lookup count
+  - `condition_cache_hit_count`: Number of successful cache hits
+
+These metrics help evaluate the cache’s benefit and hit ratio.
+
+## Usage Example
+
+### Typical Scenario
+
+Consider the following query:
+
+```
+SELECT order_id, amount
+FROM orders
+WHERE region = 'ASIA' AND order_date >= '2023-01-01';
+```
+
+- **First execution**: The query performs a full scan and evaluates the 
filter; the Condition Cache stores the result in the LRU cache.
+- **Subsequent identical queries**: They reuse the cached results, skipping 
most irrelevant row ranges and scanning only potential matches.
+
+When multiple queries share the same filtering condition (e.g., `region = 
'ASIA' AND order_date >= '2023-01-01'`), they can reuse each other’s Condition 
Cache entries, reducing overall workload.
+
+## Notes
+
+- **Cache is not persistent**: The Condition Cache is cleared upon Doris 
restart.
+- **Delete operations disable caching**: Segments with delete markers require 
strict consistency and thus do not use the cache.
+
+## Summary
+
+Condition Cache is an optimization mechanism in Doris designed for **repeated 
conditional queries**. Its advantages include:
+
+- Avoiding redundant computation and reducing CPU/I/O overhead
+- Automatically and transparently effective without user intervention
+- Lightweight in memory consumption and highly efficient when hit and filter 
rates are high
+
+By leveraging the Condition Cache effectively, users can achieve significantly 
faster response times in high-frequency OLAP query scenarios.
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/condition-cache.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/condition-cache.md
new file mode 100644
index 00000000000..cbf900f30aa
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/condition-cache.md
@@ -0,0 +1,103 @@
+# Condition Cache
+
+## 简介
+
+在大规模分析型场景中，查询往往包含重复的过滤条件（Condition），例如
+
+```
+SELECT * FROM orders WHERE region = 'ASIA';` `SELECT count(*) FROM orders 
WHERE region = 'ASIA';
+```
+
+这类查询在相同数据分片（Segment）上会反复执行相同的过滤逻辑，造成 **CPU 与** **IO** **的冗余开销**。
+
+为了解决这一问题，Apache Doris 引入了 **Condition Cache** 机制。它能够缓存特定条件在某个 Segment 
上的过滤结果，并在后续查询中直接复用，从而 **减少不必要的扫描与过滤**，显著降低查询延迟。
+
+## 工作原理
+
+Condition Cache 的核心思想是：
+
+- **相同的过滤条件在相同的数据分片上，结果是一致的**。
+- Doris 将「条件表达式 + Key Range」生成一个 **64 位摘要（digest）**，作为缓存的唯一标识符。
+- 每个 Segment 都可以根据这个摘要在缓存中查找已有的过滤结果。
+
+缓存结果以压缩的 **bit 向量（std::vector<bool>）** 存储：
+
+- **0** 表示该行范围不满足条件，可直接跳过；
+- **1** 表示该范围可能包含满足条件的数据，需要继续扫描。
+
+通过这种方式，Doris 可以在粗粒度上快速剔除无效数据块，仅在必要时进行精确过滤。
+
+## 使用条件
+
+Condition Cache 在以下场景下最为有效：
+
+**重复条件**：相同或相似的过滤条件被频繁使用。
+
+**数据相对稳定**：Segment 内部数据通常不可变（INSERT/Compaction 后会生成新的 Segment，自然淘汰旧缓存）。
+
+**高选择性**：条件过滤后仅保留少量行，能够最大化减少扫描。
+
+以下场景下不会使用 Condition Cache：
+
+- 查询中包含 **Delete 条件**（删除标记需要保证正确性，因此禁用缓存）。
+- 运行时生成的 **TopN Runtime Filter**（暂不支持）。
+
+## 配置与管理
+
+### 开启与关闭
+
+```Plain
+set enable_condition_cache = true;
+```
+
+### 内存管理
+
+- Condition Cache 使用 **LRU 策略** 进行缓存淘汰。
+- 超过 `condition_cache_limit` 后，最近最少使用的条目会被自动清除。
+
+ 如需修改通过 `be.conf` 中修改参数： `condition_cache_limit = 1024 `,单位为mb
+
+- Segment Compaction 之后，旧缓存也会随着LRU的淘汰自然失效。
+
+## 缓存统计
+
+Doris 提供了丰富的统计指标，方便用户观察 Condition Cache 的效果：
+
+- **Profile 级别指标**（查询执行计划中可见）
+  - `ConditionCacheSegmentHit`：命中缓存的 Segment 数量
+  - `ConditionCacheFilteredRows`：被缓存直接过滤掉的行数
+- **系统指标**（通过监控系统或 `metrics` 查看）
+  - `condition_cache_search_count`：缓存查找次数
+  - `condition_cache_hit_count`：缓存命中次数
+
+用户可通过这些指标来评估 Condition Cache 的收益和命中率。
+
+## 使用示例
+
+### 典型场景
+
+假设我们有如下查询：
+
+```
+SELECT order_id, amount ` `FROM orders ` `WHERE region = 'ASIA' AND order_date 
>= '2023-01-01';
+```
+
+- **第一次执行**：需要完整扫描并评估条件，Condition Cache 将结果存储到 LRU 缓存中。
+- **后续相同查询**：直接利用缓存，跳过大部分无效行范围，仅扫描可能满足条件的部分。
+
+当多个查询共享相同过滤条件时（例如 `region = 'ASIA' AND order_date >= '2023-01-01'`），它们也可以互相复用 
Condition Cache，从而减少整体开销。
+
+## 注意事项
+
+- **缓存不会持久化**：Doris 重启后，Condition Cache 会被清空。
+- **删除操作会禁用缓存**：涉及删除标记的 Segment 必须保证强一致性，因此不会使用 Condition Cache。
+
+## 总结
+
+Condition Cache 是 Doris 针对 **重复条件查询** 的优化机制, 它的优势在于：
+
+- 避免冗余计算，降低 CPU/IO 消耗
+- 自动化透明生效，无需用户干预
+- 内存占用小，命中率与过滤率高时效果显著
+
+通过合理利用 Condition Cache，用户可以在高频 OLAP 查询场景中获得更快的响应速度。
diff --git a/sidebars.json b/sidebars.json
index 7f272398f2a..06a70c19d10 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -322,6 +322,7 @@
                             ]
                         },
                         "query-acceleration/sql-cache-manual",
+                        "query-acceleration/condition-cache",
                         "query-acceleration/high-concurrent-point-query",
                         "query-acceleration/dictionary",
                         {
@@ -2529,4 +2530,4 @@
             ]
         }
     ]
-}
\ No newline at end of file
+}


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [Doc](exec) Support condition cache doc (#2963)

Reply via email to