This is an automated email from the ASF dual-hosted git repository.
lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new b51e1797ed0 [Doc](exec) Support condition cache doc (#2963)
b51e1797ed0 is described below
commit b51e1797ed0685074a4bd418c91fa30dc3dad27f
Author: HappenLee <[email protected]>
AuthorDate: Mon Oct 13 11:33:36 2025 +0800
[Doc](exec) Support condition cache doc (#2963)
## Versions
- [x] dev
- [ ] 3.0
- [ ] 2.1
- [ ] 2.0
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
docs/query-acceleration/condition-cache.md | 109 +++++++++++++++++++++
.../current/query-acceleration/condition-cache.md | 103 +++++++++++++++++++
sidebars.json | 3 +-
3 files changed, 214 insertions(+), 1 deletion(-)
diff --git a/docs/query-acceleration/condition-cache.md
b/docs/query-acceleration/condition-cache.md
new file mode 100644
index 00000000000..6066ab9dc20
--- /dev/null
+++ b/docs/query-acceleration/condition-cache.md
@@ -0,0 +1,109 @@
+# Condition Cache
+
+## Introduction
+
+In large-scale analytical workloads, queries often include **repeated
filtering conditions (Conditions)**, for example:
+
+```
+SELECT * FROM orders WHERE region = 'ASIA';
+SELECT count(*) FROM orders WHERE region = 'ASIA';
+```
+
+Such queries repeatedly execute the same filtering logic on identical data
segments, leading to **redundant CPU and I/O overhead**.
+
+To address this, **Apache Doris introduces the Condition Cache mechanism**.
+ It caches the filtering results of specific conditions on a given segment,
allowing subsequent queries to **reuse those results directly**, thereby
**reducing unnecessary scans and filtering operations** and significantly
lowering query latency.
+
+## Working Principle
+
+The core concept of the Condition Cache is:
+
+- **The same filtering condition produces the same result on the same data
segment.**
+- Doris generates a **64-bit digest** from the combination of “condition
expression + key range,” which serves as a unique cache identifier.
+- Each segment can then look up existing filtering results in the cache using
this digest.
+
+Cached results are stored as compressed **bit vectors (`std::vector<bool>`)**:
+
+- **0** indicates that the row range does not meet the condition and can be
skipped directly;
+- **1** indicates that the range may contain matching data and needs further
scanning.
+
+Through this mechanism, Doris can quickly eliminate irrelevant data blocks at
a coarse granularity, performing fine-grained filtering only when necessary.
+
+## Applicable Scenarios
+
+Condition Cache is most effective in the following cases:
+
+- **Repeated conditions**: Identical or similar filter conditions are
frequently used.
+- **Relatively stable data**: Data inside a segment is typically immutable
(new segments are generated after INSERT/Compaction, naturally invalidating old
caches).
+- **High selectivity**: When filters leave only a small subset of rows, it
maximizes scan reduction.
+
+Condition Cache will **not** be used in the following situations:
+
+- Queries containing **delete predicates** (to ensure correctness, caching is
disabled).
+- **TopN runtime filters** generated at runtime (currently unsupported).
+
+## Configuration and Management
+
+### Enable or Disable
+
+```
+SET enable_condition_cache = true;
+```
+
+### Memory Management
+
+- Condition Cache uses an **LRU policy** for cache eviction.
+- When exceeding `condition_cache_limit`, the least recently used entries are
automatically cleared.
+
+You can modify the memory limit in `be.conf`:
+
+```
+condition_cache_limit = 1024 # Unit: MB
+```
+
+- After segment compaction, old cache entries are naturally invalidated
through LRU eviction.
+
+## Cache Statistics
+
+Doris provides comprehensive metrics to help users monitor the effectiveness
of Condition Cache:
+
+- **Profile-level metrics** (visible in query execution plans)
+ - `ConditionCacheSegmentHit`: Number of segments that hit the cache
+ - `ConditionCacheFilteredRows`: Number of rows skipped directly by cached
results
+- **System metrics** (viewable via the monitoring system or `/metrics`)
+ - `condition_cache_search_count`: Total cache lookup count
+ - `condition_cache_hit_count`: Number of successful cache hits
+
+These metrics help evaluate the cache’s benefit and hit ratio.
+
+## Usage Example
+
+### Typical Scenario
+
+Consider the following query:
+
+```
+SELECT order_id, amount
+FROM orders
+WHERE region = 'ASIA' AND order_date >= '2023-01-01';
+```
+
+- **First execution**: The query performs a full scan and evaluates the
filter; the Condition Cache stores the result in the LRU cache.
+- **Subsequent identical queries**: They reuse the cached results, skipping
most irrelevant row ranges and scanning only potential matches.
+
+When multiple queries share the same filtering condition (e.g., `region =
'ASIA' AND order_date >= '2023-01-01'`), they can reuse each other’s Condition
Cache entries, reducing overall workload.
+
+## Notes
+
+- **Cache is not persistent**: The Condition Cache is cleared upon Doris
restart.
+- **Delete operations disable caching**: Segments with delete markers require
strict consistency and thus do not use the cache.
+
+## Summary
+
+Condition Cache is an optimization mechanism in Doris designed for **repeated
conditional queries**. Its advantages include:
+
+- Avoiding redundant computation and reducing CPU/I/O overhead
+- Automatically and transparently effective without user intervention
+- Lightweight in memory consumption and highly efficient when hit and filter
rates are high
+
+By leveraging the Condition Cache effectively, users can achieve significantly
faster response times in high-frequency OLAP query scenarios.
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/condition-cache.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/condition-cache.md
new file mode 100644
index 00000000000..cbf900f30aa
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/condition-cache.md
@@ -0,0 +1,103 @@
+# Condition Cache
+
+## 简介
+
+在大规模分析型场景中,查询往往包含重复的过滤条件(Condition),例如
+
+```
+SELECT * FROM orders WHERE region = 'ASIA';` `SELECT count(*) FROM orders
WHERE region = 'ASIA';
+```
+
+这类查询在相同数据分片(Segment)上会反复执行相同的过滤逻辑,造成 **CPU 与** **IO** **的冗余开销**。
+
+为了解决这一问题,Apache Doris 引入了 **Condition Cache** 机制。它能够缓存特定条件在某个 Segment
上的过滤结果,并在后续查询中直接复用,从而 **减少不必要的扫描与过滤**,显著降低查询延迟。
+
+## 工作原理
+
+Condition Cache 的核心思想是:
+
+- **相同的过滤条件在相同的数据分片上,结果是一致的**。
+- Doris 将「条件表达式 + Key Range」生成一个 **64 位摘要(digest)**,作为缓存的唯一标识符。
+- 每个 Segment 都可以根据这个摘要在缓存中查找已有的过滤结果。
+
+缓存结果以压缩的 **bit 向量(std::vector<bool>)** 存储:
+
+- **0** 表示该行范围不满足条件,可直接跳过;
+- **1** 表示该范围可能包含满足条件的数据,需要继续扫描。
+
+通过这种方式,Doris 可以在粗粒度上快速剔除无效数据块,仅在必要时进行精确过滤。
+
+## 使用条件
+
+Condition Cache 在以下场景下最为有效:
+
+**重复条件**:相同或相似的过滤条件被频繁使用。
+
+**数据相对稳定**:Segment 内部数据通常不可变(INSERT/Compaction 后会生成新的 Segment,自然淘汰旧缓存)。
+
+**高选择性**:条件过滤后仅保留少量行,能够最大化减少扫描。
+
+以下场景下不会使用 Condition Cache:
+
+- 查询中包含 **Delete 条件**(删除标记需要保证正确性,因此禁用缓存)。
+- 运行时生成的 **TopN Runtime Filter**(暂不支持)。
+
+## 配置与管理
+
+### 开启与关闭
+
+```Plain
+set enable_condition_cache = true;
+```
+
+### 内存管理
+
+- Condition Cache 使用 **LRU 策略** 进行缓存淘汰。
+- 超过 `condition_cache_limit` 后,最近最少使用的条目会被自动清除。
+
+ 如需修改通过 `be.conf` 中修改参数: `condition_cache_limit = 1024 `,单位为mb
+
+- Segment Compaction 之后,旧缓存也会随着LRU的淘汰自然失效。
+
+## 缓存统计
+
+Doris 提供了丰富的统计指标,方便用户观察 Condition Cache 的效果:
+
+- **Profile 级别指标**(查询执行计划中可见)
+ - `ConditionCacheSegmentHit`:命中缓存的 Segment 数量
+ - `ConditionCacheFilteredRows`:被缓存直接过滤掉的行数
+- **系统指标**(通过监控系统或 `metrics` 查看)
+ - `condition_cache_search_count`:缓存查找次数
+ - `condition_cache_hit_count`:缓存命中次数
+
+用户可通过这些指标来评估 Condition Cache 的收益和命中率。
+
+## 使用示例
+
+### 典型场景
+
+假设我们有如下查询:
+
+```
+SELECT order_id, amount ` `FROM orders ` `WHERE region = 'ASIA' AND order_date
>= '2023-01-01';
+```
+
+- **第一次执行**:需要完整扫描并评估条件,Condition Cache 将结果存储到 LRU 缓存中。
+- **后续相同查询**:直接利用缓存,跳过大部分无效行范围,仅扫描可能满足条件的部分。
+
+当多个查询共享相同过滤条件时(例如 `region = 'ASIA' AND order_date >= '2023-01-01'`),它们也可以互相复用
Condition Cache,从而减少整体开销。
+
+## 注意事项
+
+- **缓存不会持久化**:Doris 重启后,Condition Cache 会被清空。
+- **删除操作会禁用缓存**:涉及删除标记的 Segment 必须保证强一致性,因此不会使用 Condition Cache。
+
+## 总结
+
+Condition Cache 是 Doris 针对 **重复条件查询** 的优化机制, 它的优势在于:
+
+- 避免冗余计算,降低 CPU/IO 消耗
+- 自动化透明生效,无需用户干预
+- 内存占用小,命中率与过滤率高时效果显著
+
+通过合理利用 Condition Cache,用户可以在高频 OLAP 查询场景中获得更快的响应速度。
diff --git a/sidebars.json b/sidebars.json
index 7f272398f2a..06a70c19d10 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -322,6 +322,7 @@
]
},
"query-acceleration/sql-cache-manual",
+ "query-acceleration/condition-cache",
"query-acceleration/high-concurrent-point-query",
"query-acceleration/dictionary",
{
@@ -2529,4 +2530,4 @@
]
}
]
-}
\ No newline at end of file
+}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]