This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new e8e4e30334c [docs](data-cache) Add data-cache-warmup documents. (#3165)
e8e4e30334c is described below
commit e8e4e30334c6ac399aae2395c08d4da1c467dcd7
Author: Qi Chen <[email protected]>
AuthorDate: Mon Dec 8 21:35:24 2025 +0800
[docs](data-cache) Add data-cache-warmup documents. (#3165)
## Versions
- [x] dev
- [ ] 4.x
- [ ] 3.x
- [ ] 2.1
## Languages
- [x] Chinese
- [ ] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---------
Co-authored-by: Mingyu Chen (Rayner) <[email protected]>
---
docs/lakehouse/best-practices/optimization.md | 2 +
docs/lakehouse/data-cache.md | 78 ++++++++++++++++++++++
.../storage-management/WARM-UP.md | 3 +-
.../lakehouse/best-practices/optimization.md | 2 +
.../current/lakehouse/data-cache.md | 78 ++++++++++++++++++++++
.../storage-management/WARM-UP.md | 2 +
.../lakehouse/best-practices/optimization.md | 2 +
.../lakehouse/best-practices/optimization.md | 2 +
.../storage-management/WARM-UP.md | 2 +
.../lakehouse/best-practices/optimization.md | 2 +
.../version-4.x/lakehouse/data-cache.md | 78 ++++++++++++++++++++++
.../storage-management/WARM-UP.md | 2 +
.../lakehouse/best-practices/optimization.md | 2 +
.../lakehouse/best-practices/optimization.md | 2 +
.../storage-management/WARM-UP.md | 3 +-
.../lakehouse/best-practices/optimization.md | 2 +
versioned_docs/version-4.x/lakehouse/data-cache.md | 78 ++++++++++++++++++++++
.../storage-management/WARM-UP.md | 3 +-
18 files changed, 340 insertions(+), 3 deletions(-)
diff --git a/docs/lakehouse/best-practices/optimization.md
b/docs/lakehouse/best-practices/optimization.md
index 00f3dc21ec9..d52152d4d76 100644
--- a/docs/lakehouse/best-practices/optimization.md
+++ b/docs/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@ Data Cache accelerates subsequent queries accessing the same
data by caching rec
The cache feature is disabled by default. Please refer to the [Data
Cache](../data-cache.md) documentation to configure and enable it.
+Since version 4.0.2, cache warmup functionality is supported, which can
further actively utilize data cache to improve query performance.
+
## HDFS Read Optimization
Please refer to the **HDFS IO Optimization** section in the [HDFS
Documentation](../storages/hdfs.md).
diff --git a/docs/lakehouse/data-cache.md b/docs/lakehouse/data-cache.md
index 7fbeb114bbb..f2c937baee8 100644
--- a/docs/lakehouse/data-cache.md
+++ b/docs/lakehouse/data-cache.md
@@ -106,6 +106,84 @@ If `BytesScannedFromRemote` is 0, it means the cache is
fully hit.
Users can view cache statistics for each Backend node through the system table
[`file_cache_statistics`](../admin-manual/system-tables/information_schema/file_cache_statistics).
+## Cache Warmup
+
+Data Cache provides a cache "warmup" feature that allows preloading external
data into the local cache of BE nodes, thereby improving cache hit rates and
query performance for subsequent first-time queries.
+
+> This feature is supported since version 4.0.2.
+
+### Syntax
+
+```sql
+WARM UP SELECT <select_expr_list>
+FROM <table_reference>
+[WHERE <boolean_expression>]
+```
+
+Usage restrictions:
+
+* Supported:
+
+ * Single table queries (only one table_reference allowed)
+ * Simple SELECT for specified columns
+ * WHERE filtering (supports regular predicates)
+
+* Not supported:
+
+ * JOIN, UNION, subqueries, CTE
+ * GROUP BY, HAVING, ORDER BY
+ * LIMIT
+ * INTO OUTFILE
+ * Multi-table / complex query plans
+ * Other complex syntax
+
+### Examples
+
+1. Warm up the entire table
+
+ ```sql
+ WARM UP SELECT * FROM hive_db.tpch100_parquet.lineitem;
+ ```
+
+2. Warm up partial columns by partition
+
+ ```sql
+ WARM UP SELECT l_orderkey, l_shipmode
+ FROM hive_db.tpch100_parquet.lineitem
+ WHERE dt = '2025-01-01';
+ ```
+3. Warm up partial columns by filter conditions
+
+ ```sql
+ WARM UP SELECT l_shipmode, l_linestatus
+ FROM hive_db.tpch100_parquet.lineitem
+ WHERE l_orderkey = 123456;
+ ```
+
+### Execution Results
+
+After executing `WARM UP SELECT`, the FE dispatches tasks to each BE. The BE
scans remote data and writes it to Data Cache.
+
+The system directly returns scan and cache write statistics for each BE (Note:
Statistics are generally accurate but may have some margin of error). For
example:
+
+```
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| BackendId | ScanRows | ScanBytes | ScanBytesFromLocalStorage |
ScanBytesFromRemoteStorage | BytesWriteIntoCache |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| 1755134092928 | 294744184 | 11821864798 | 538154009 |
11283717130 | 11899799492 |
+| 1755134092929 | 305293718 | 12244439301 | 560970435 |
11683475207 | 12332861380 |
+| TOTAL | 600037902 | 24066304099 | 1099124444 |
22967192337 | 24232660872 |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+```
+
+Field explanations:
+
+* ScanRows: Number of rows scanned and read.
+* ScanBytes: Amount of data scanned and read.
+* ScanBytesFromLocalStorage: Amount of data scanned and read from local cache.
+* ScanBytesFromRemoteStorage: Amount of data scanned and read from remote
storage.
+* BytesWriteIntoCache: Amount of data written to Data Cache during this warmup.
+
## Appendix
### Principle
diff --git
a/docs/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
b/docs/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
index 87f94808591..be303e7d28a 100644
---
a/docs/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
+++
b/docs/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
@@ -9,6 +9,7 @@
The `WARM UP COMPUTE GROUP` statement is used to warm up data in a compute
group to improve query performance. The warm-up operation can fetch resources
from another compute group or specify particular tables and partitions for
warming up. The warm-up operation returns a job ID that can be used to track
the status of the warm-up job.
+> For information on how to warmup the cache for Catalog query scenarios,
please refer to the [Data Cache
documentation](../../../../lakehouse/data-cache.md).
## Syntax
@@ -55,4 +56,4 @@ warm_up_item ::= TABLE <table_name> [PARTITION
<partition_name>];
AND TABLE customer_info
AND TABLE orders PARTITION q1_2024;
-```
\ No newline at end of file
+```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/optimization.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/optimization.md
index 619e899e571..2f949d67235 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/optimization.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@
缓存功能默认是关闭的,请参阅 [数据缓存](../data-cache.md) 文档配置并开启。
+自 4.0.2 版本开始支持缓存预热功能,可以进一步主动利用数据缓存提升查询性能。
+
## HDFS 读取优化
可参考 [HDFS 文档](../storages/hdfs.md) 中 **HDFS IO 优化** 部分。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/data-cache.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/data-cache.md
index 1785ebdd160..c54419a9f83 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/data-cache.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/data-cache.md
@@ -106,6 +106,84 @@ SET GLOBAL enable_file_cache = true;
用户可以通过系统表
[`file_cache_statistics`](../admin-manual/system-tables/information_schema/file_cache_statistics)
查看各个 Backend 节点的缓存统计指标。
+## 缓存预热
+
+Data Cache 提供缓存“预热(Warmup)”功能,允许将外部数据提前加载到 BE 节点的本地缓存中,从而提升后续首次查询的命中率和查询性能。
+
+> 该功能自 4.0.2 版本支持。
+
+### 语法
+
+```sql
+WARM UP SELECT <select_expr_list>
+FROM <table_reference>
+[WHERE <boolean_expression>]
+```
+
+使用限制:
+
+* 支持:
+
+ * 单表查询(仅允许一个 table_reference)
+ * 指定列的简单 SELECT
+ * WHERE 过滤(支持常规谓词)
+
+* 不支持:
+
+ * JOIN、UNION、子查询、CTE
+ * GROUP BY、HAVING、ORDER BY
+ * LIMIT
+ * INTO OUTFILE
+ * 多表 / 复杂查询计划
+ * 其它复杂语法
+
+### 示例
+
+1. 预热整张表
+
+ ```sql
+ WARM UP SELECT * FROM hive_db.tpch100_parquet.lineitem;
+ ```
+
+2. 根据分区预热部分列
+
+ ```sql
+ WARM UP SELECT l_orderkey, l_shipmode
+ FROM hive_db.tpch100_parquet.lineitem
+ WHERE dt = '2025-01-01';
+ ```
+3. 根据过滤条件预热部分列
+
+ ```sql
+ WARM UP SELECT l_shipmode, l_linestatus
+ FROM hive_db.tpch100_parquet.lineitem
+ WHERE l_orderkey = 123456;
+ ```
+
+### 执行返回结果
+
+执行 `WARM UP SELECT` 后,FE 会下发任务至各 BE。BE 扫描远端数据并写入 Data Cache。
+
+系统会直接返回各 BE 的扫描与缓存写入统计信息(注意:统计信息基本准确,但会有一定误差)。例如:
+
+```
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| BackendId | ScanRows | ScanBytes | ScanBytesFromLocalStorage |
ScanBytesFromRemoteStorage | BytesWriteIntoCache |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| 1755134092928 | 294744184 | 11821864798 | 538154009 |
11283717130 | 11899799492 |
+| 1755134092929 | 305293718 | 12244439301 | 560970435 |
11683475207 | 12332861380 |
+| TOTAL | 600037902 | 24066304099 | 1099124444 |
22967192337 | 24232660872 |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+```
+
+字段解释
+
+* ScanRows:扫描读取行数。
+* ScanBytes:扫描读取数据量。
+* ScanBytesFromLocalStorage:从本地缓存扫描读取的数据量。
+* ScanBytesFromRemoteStorage:从远端存储扫描读取的数据量。
+* BytesWriteIntoCache:本次预热写入 Data Cache 的数据量。
+
## 附录
### 原理
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
index 1d10f5288e0..202ae67e3c0 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
@@ -9,6 +9,8 @@
WARM UP COMPUTE GROUP
语句用于预热计算组中的数据,以提高查询性能。预热操作可以从另一个计算组中获取资源,也可以指定特定的表和分区进行预热。预热操作返回一个作业
ID,可以用于追踪预热作业的状态。
+> 关于如何针对 Catalog 查询场景下预热缓存,请参阅 [Data Cache
文档](../../../../lakehouse/data-cache.md)。
+
## 语法
```sql
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/best-practices/optimization.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/best-practices/optimization.md
index 619e899e571..2f949d67235 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/best-practices/optimization.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@
缓存功能默认是关闭的,请参阅 [数据缓存](../data-cache.md) 文档配置并开启。
+自 4.0.2 版本开始支持缓存预热功能,可以进一步主动利用数据缓存提升查询性能。
+
## HDFS 读取优化
可参考 [HDFS 文档](../storages/hdfs.md) 中 **HDFS IO 优化** 部分。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/best-practices/optimization.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/best-practices/optimization.md
index 619e899e571..2f949d67235 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/best-practices/optimization.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@
缓存功能默认是关闭的,请参阅 [数据缓存](../data-cache.md) 文档配置并开启。
+自 4.0.2 版本开始支持缓存预热功能,可以进一步主动利用数据缓存提升查询性能。
+
## HDFS 读取优化
可参考 [HDFS 文档](../storages/hdfs.md) 中 **HDFS IO 优化** 部分。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
index 1d10f5288e0..202ae67e3c0 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
@@ -9,6 +9,8 @@
WARM UP COMPUTE GROUP
语句用于预热计算组中的数据,以提高查询性能。预热操作可以从另一个计算组中获取资源,也可以指定特定的表和分区进行预热。预热操作返回一个作业
ID,可以用于追踪预热作业的状态。
+> 关于如何针对 Catalog 查询场景下预热缓存,请参阅 [Data Cache
文档](../../../../lakehouse/data-cache.md)。
+
## 语法
```sql
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/best-practices/optimization.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/best-practices/optimization.md
index 619e899e571..2f949d67235 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/best-practices/optimization.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@
缓存功能默认是关闭的,请参阅 [数据缓存](../data-cache.md) 文档配置并开启。
+自 4.0.2 版本开始支持缓存预热功能,可以进一步主动利用数据缓存提升查询性能。
+
## HDFS 读取优化
可参考 [HDFS 文档](../storages/hdfs.md) 中 **HDFS IO 优化** 部分。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/data-cache.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/data-cache.md
index 1785ebdd160..c54419a9f83 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/data-cache.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/data-cache.md
@@ -106,6 +106,84 @@ SET GLOBAL enable_file_cache = true;
用户可以通过系统表
[`file_cache_statistics`](../admin-manual/system-tables/information_schema/file_cache_statistics)
查看各个 Backend 节点的缓存统计指标。
+## 缓存预热
+
+Data Cache 提供缓存“预热(Warmup)”功能,允许将外部数据提前加载到 BE 节点的本地缓存中,从而提升后续首次查询的命中率和查询性能。
+
+> 该功能自 4.0.2 版本支持。
+
+### 语法
+
+```sql
+WARM UP SELECT <select_expr_list>
+FROM <table_reference>
+[WHERE <boolean_expression>]
+```
+
+使用限制:
+
+* 支持:
+
+ * 单表查询(仅允许一个 table_reference)
+ * 指定列的简单 SELECT
+ * WHERE 过滤(支持常规谓词)
+
+* 不支持:
+
+ * JOIN、UNION、子查询、CTE
+ * GROUP BY、HAVING、ORDER BY
+ * LIMIT
+ * INTO OUTFILE
+ * 多表 / 复杂查询计划
+ * 其它复杂语法
+
+### 示例
+
+1. 预热整张表
+
+ ```sql
+ WARM UP SELECT * FROM hive_db.tpch100_parquet.lineitem;
+ ```
+
+2. 根据分区预热部分列
+
+ ```sql
+ WARM UP SELECT l_orderkey, l_shipmode
+ FROM hive_db.tpch100_parquet.lineitem
+ WHERE dt = '2025-01-01';
+ ```
+3. 根据过滤条件预热部分列
+
+ ```sql
+ WARM UP SELECT l_shipmode, l_linestatus
+ FROM hive_db.tpch100_parquet.lineitem
+ WHERE l_orderkey = 123456;
+ ```
+
+### 执行返回结果
+
+执行 `WARM UP SELECT` 后,FE 会下发任务至各 BE。BE 扫描远端数据并写入 Data Cache。
+
+系统会直接返回各 BE 的扫描与缓存写入统计信息(注意:统计信息基本准确,但会有一定误差)。例如:
+
+```
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| BackendId | ScanRows | ScanBytes | ScanBytesFromLocalStorage |
ScanBytesFromRemoteStorage | BytesWriteIntoCache |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| 1755134092928 | 294744184 | 11821864798 | 538154009 |
11283717130 | 11899799492 |
+| 1755134092929 | 305293718 | 12244439301 | 560970435 |
11683475207 | 12332861380 |
+| TOTAL | 600037902 | 24066304099 | 1099124444 |
22967192337 | 24232660872 |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+```
+
+字段解释
+
+* ScanRows:扫描读取行数。
+* ScanBytes:扫描读取数据量。
+* ScanBytesFromLocalStorage:从本地缓存扫描读取的数据量。
+* ScanBytesFromRemoteStorage:从远端存储扫描读取的数据量。
+* BytesWriteIntoCache:本次预热写入 Data Cache 的数据量。
+
## 附录
### 原理
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
index 1d10f5288e0..202ae67e3c0 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
@@ -9,6 +9,8 @@
WARM UP COMPUTE GROUP
语句用于预热计算组中的数据,以提高查询性能。预热操作可以从另一个计算组中获取资源,也可以指定特定的表和分区进行预热。预热操作返回一个作业
ID,可以用于追踪预热作业的状态。
+> 关于如何针对 Catalog 查询场景下预热缓存,请参阅 [Data Cache
文档](../../../../lakehouse/data-cache.md)。
+
## 语法
```sql
diff --git
a/versioned_docs/version-2.1/lakehouse/best-practices/optimization.md
b/versioned_docs/version-2.1/lakehouse/best-practices/optimization.md
index 00f3dc21ec9..d52152d4d76 100644
--- a/versioned_docs/version-2.1/lakehouse/best-practices/optimization.md
+++ b/versioned_docs/version-2.1/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@ Data Cache accelerates subsequent queries accessing the same
data by caching rec
The cache feature is disabled by default. Please refer to the [Data
Cache](../data-cache.md) documentation to configure and enable it.
+Since version 4.0.2, cache warmup functionality is supported, which can
further actively utilize data cache to improve query performance.
+
## HDFS Read Optimization
Please refer to the **HDFS IO Optimization** section in the [HDFS
Documentation](../storages/hdfs.md).
diff --git
a/versioned_docs/version-3.x/lakehouse/best-practices/optimization.md
b/versioned_docs/version-3.x/lakehouse/best-practices/optimization.md
index 00f3dc21ec9..d52152d4d76 100644
--- a/versioned_docs/version-3.x/lakehouse/best-practices/optimization.md
+++ b/versioned_docs/version-3.x/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@ Data Cache accelerates subsequent queries accessing the same
data by caching rec
The cache feature is disabled by default. Please refer to the [Data
Cache](../data-cache.md) documentation to configure and enable it.
+Since version 4.0.2, cache warmup functionality is supported, which can
further actively utilize data cache to improve query performance.
+
## HDFS Read Optimization
Please refer to the **HDFS IO Optimization** section in the [HDFS
Documentation](../storages/hdfs.md).
diff --git
a/versioned_docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
b/versioned_docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
index 87f94808591..be303e7d28a 100644
---
a/versioned_docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
+++
b/versioned_docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
@@ -9,6 +9,7 @@
The `WARM UP COMPUTE GROUP` statement is used to warm up data in a compute
group to improve query performance. The warm-up operation can fetch resources
from another compute group or specify particular tables and partitions for
warming up. The warm-up operation returns a job ID that can be used to track
the status of the warm-up job.
+> For information on how to warmup the cache for Catalog query scenarios,
please refer to the [Data Cache
documentation](../../../../lakehouse/data-cache.md).
## Syntax
@@ -55,4 +56,4 @@ warm_up_item ::= TABLE <table_name> [PARTITION
<partition_name>];
AND TABLE customer_info
AND TABLE orders PARTITION q1_2024;
-```
\ No newline at end of file
+```
diff --git
a/versioned_docs/version-4.x/lakehouse/best-practices/optimization.md
b/versioned_docs/version-4.x/lakehouse/best-practices/optimization.md
index 00f3dc21ec9..d52152d4d76 100644
--- a/versioned_docs/version-4.x/lakehouse/best-practices/optimization.md
+++ b/versioned_docs/version-4.x/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@ Data Cache accelerates subsequent queries accessing the same
data by caching rec
The cache feature is disabled by default. Please refer to the [Data
Cache](../data-cache.md) documentation to configure and enable it.
+Since version 4.0.2, cache warmup functionality is supported, which can
further actively utilize data cache to improve query performance.
+
## HDFS Read Optimization
Please refer to the **HDFS IO Optimization** section in the [HDFS
Documentation](../storages/hdfs.md).
diff --git a/versioned_docs/version-4.x/lakehouse/data-cache.md
b/versioned_docs/version-4.x/lakehouse/data-cache.md
index 7fbeb114bbb..f2c937baee8 100644
--- a/versioned_docs/version-4.x/lakehouse/data-cache.md
+++ b/versioned_docs/version-4.x/lakehouse/data-cache.md
@@ -106,6 +106,84 @@ If `BytesScannedFromRemote` is 0, it means the cache is
fully hit.
Users can view cache statistics for each Backend node through the system table
[`file_cache_statistics`](../admin-manual/system-tables/information_schema/file_cache_statistics).
+## Cache Warmup
+
+Data Cache provides a cache "warmup" feature that allows preloading external
data into the local cache of BE nodes, thereby improving cache hit rates and
query performance for subsequent first-time queries.
+
+> This feature is supported since version 4.0.2.
+
+### Syntax
+
+```sql
+WARM UP SELECT <select_expr_list>
+FROM <table_reference>
+[WHERE <boolean_expression>]
+```
+
+Usage restrictions:
+
+* Supported:
+
+ * Single table queries (only one table_reference allowed)
+ * Simple SELECT for specified columns
+ * WHERE filtering (supports regular predicates)
+
+* Not supported:
+
+ * JOIN, UNION, subqueries, CTE
+ * GROUP BY, HAVING, ORDER BY
+ * LIMIT
+ * INTO OUTFILE
+ * Multi-table / complex query plans
+ * Other complex syntax
+
+### Examples
+
+1. Warm up the entire table
+
+ ```sql
+ WARM UP SELECT * FROM hive_db.tpch100_parquet.lineitem;
+ ```
+
+2. Warm up partial columns by partition
+
+ ```sql
+ WARM UP SELECT l_orderkey, l_shipmode
+ FROM hive_db.tpch100_parquet.lineitem
+ WHERE dt = '2025-01-01';
+ ```
+3. Warm up partial columns by filter conditions
+
+ ```sql
+ WARM UP SELECT l_shipmode, l_linestatus
+ FROM hive_db.tpch100_parquet.lineitem
+ WHERE l_orderkey = 123456;
+ ```
+
+### Execution Results
+
+After executing `WARM UP SELECT`, the FE dispatches tasks to each BE. The BE
scans remote data and writes it to Data Cache.
+
+The system directly returns scan and cache write statistics for each BE (Note:
Statistics are generally accurate but may have some margin of error). For
example:
+
+```
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| BackendId | ScanRows | ScanBytes | ScanBytesFromLocalStorage |
ScanBytesFromRemoteStorage | BytesWriteIntoCache |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| 1755134092928 | 294744184 | 11821864798 | 538154009 |
11283717130 | 11899799492 |
+| 1755134092929 | 305293718 | 12244439301 | 560970435 |
11683475207 | 12332861380 |
+| TOTAL | 600037902 | 24066304099 | 1099124444 |
22967192337 | 24232660872 |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+```
+
+Field explanations:
+
+* ScanRows: Number of rows scanned and read.
+* ScanBytes: Amount of data scanned and read.
+* ScanBytesFromLocalStorage: Amount of data scanned and read from local cache.
+* ScanBytesFromRemoteStorage: Amount of data scanned and read from remote
storage.
+* BytesWriteIntoCache: Amount of data written to Data Cache during this warmup.
+
## Appendix
### Principle
diff --git
a/versioned_docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
b/versioned_docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
index 87f94808591..be303e7d28a 100644
---
a/versioned_docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
+++
b/versioned_docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
@@ -9,6 +9,7 @@
The `WARM UP COMPUTE GROUP` statement is used to warm up data in a compute
group to improve query performance. The warm-up operation can fetch resources
from another compute group or specify particular tables and partitions for
warming up. The warm-up operation returns a job ID that can be used to track
the status of the warm-up job.
+> For information on how to warmup the cache for Catalog query scenarios,
please refer to the [Data Cache
documentation](../../../../lakehouse/data-cache.md).
## Syntax
@@ -55,4 +56,4 @@ warm_up_item ::= TABLE <table_name> [PARTITION
<partition_name>];
AND TABLE customer_info
AND TABLE orders PARTITION q1_2024;
-```
\ No newline at end of file
+```
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]