(doris-website) branch master updated: [docs](data-cache) Add data-cache-warmup documents. (#3165)

morningman Mon, 08 Dec 2025 05:35:46 -0800

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new e8e4e30334c [docs](data-cache) Add data-cache-warmup documents. (#3165)
e8e4e30334c is described below

commit e8e4e30334c6ac399aae2395c08d4da1c467dcd7
Author: Qi Chen <[email protected]>
AuthorDate: Mon Dec 8 21:35:24 2025 +0800

    [docs](data-cache) Add data-cache-warmup documents. (#3165)
    
    ## Versions
    
    - [x] dev
    - [ ] 4.x
    - [ ] 3.x
    - [ ] 2.1
    
    ## Languages
    
    - [x] Chinese
    - [ ] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
    
    ---------
    
    Co-authored-by: Mingyu Chen (Rayner) <[email protected]>
---
 docs/lakehouse/best-practices/optimization.md      |  2 +
 docs/lakehouse/data-cache.md                       | 78 ++++++++++++++++++++++
 .../storage-management/WARM-UP.md                  |  3 +-
 .../lakehouse/best-practices/optimization.md       |  2 +
 .../current/lakehouse/data-cache.md                | 78 ++++++++++++++++++++++
 .../storage-management/WARM-UP.md                  |  2 +
 .../lakehouse/best-practices/optimization.md       |  2 +
 .../lakehouse/best-practices/optimization.md       |  2 +
 .../storage-management/WARM-UP.md                  |  2 +
 .../lakehouse/best-practices/optimization.md       |  2 +
 .../version-4.x/lakehouse/data-cache.md            | 78 ++++++++++++++++++++++
 .../storage-management/WARM-UP.md                  |  2 +
 .../lakehouse/best-practices/optimization.md       |  2 +
 .../lakehouse/best-practices/optimization.md       |  2 +
 .../storage-management/WARM-UP.md                  |  3 +-
 .../lakehouse/best-practices/optimization.md       |  2 +
 versioned_docs/version-4.x/lakehouse/data-cache.md | 78 ++++++++++++++++++++++
 .../storage-management/WARM-UP.md                  |  3 +-
 18 files changed, 340 insertions(+), 3 deletions(-)

diff --git a/docs/lakehouse/best-practices/optimization.md 
b/docs/lakehouse/best-practices/optimization.md
index 00f3dc21ec9..d52152d4d76 100644
--- a/docs/lakehouse/best-practices/optimization.md
+++ b/docs/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@ Data Cache accelerates subsequent queries accessing the same 
data by caching rec
 
 The cache feature is disabled by default. Please refer to the [Data 
Cache](../data-cache.md) documentation to configure and enable it.
 
+Since version 4.0.2, cache warmup functionality is supported, which can 
further actively utilize data cache to improve query performance.
+
 ## HDFS Read Optimization
 
 Please refer to the **HDFS IO Optimization** section in the [HDFS 
Documentation](../storages/hdfs.md).
diff --git a/docs/lakehouse/data-cache.md b/docs/lakehouse/data-cache.md
index 7fbeb114bbb..f2c937baee8 100644
--- a/docs/lakehouse/data-cache.md
+++ b/docs/lakehouse/data-cache.md
@@ -106,6 +106,84 @@ If `BytesScannedFromRemote` is 0, it means the cache is 
fully hit.
 
 Users can view cache statistics for each Backend node through the system table 
[`file_cache_statistics`](../admin-manual/system-tables/information_schema/file_cache_statistics).
 
+## Cache Warmup
+
+Data Cache provides a cache "warmup" feature that allows preloading external 
data into the local cache of BE nodes, thereby improving cache hit rates and 
query performance for subsequent first-time queries.
+
+> This feature is supported since version 4.0.2.
+
+### Syntax
+
+```sql
+WARM UP SELECT <select_expr_list>
+FROM <table_reference>
+[WHERE <boolean_expression>]
+```
+
+Usage restrictions:
+
+* Supported:
+
+  * Single table queries (only one table_reference allowed)
+  * Simple SELECT for specified columns
+  * WHERE filtering (supports regular predicates)
+
+* Not supported:
+
+  * JOIN, UNION, subqueries, CTE
+  * GROUP BY, HAVING, ORDER BY
+  * LIMIT
+  * INTO OUTFILE
+  * Multi-table / complex query plans
+  * Other complex syntax
+
+### Examples
+
+1. Warm up the entire table
+
+  ```sql
+  WARM UP SELECT * FROM hive_db.tpch100_parquet.lineitem;
+  ```
+
+2. Warm up partial columns by partition
+
+  ```sql
+  WARM UP SELECT l_orderkey, l_shipmode
+  FROM hive_db.tpch100_parquet.lineitem
+  WHERE dt = '2025-01-01';
+  ```
+3. Warm up partial columns by filter conditions
+
+  ```sql
+  WARM UP SELECT l_shipmode, l_linestatus
+  FROM hive_db.tpch100_parquet.lineitem
+  WHERE l_orderkey = 123456;
+  ```
+
+### Execution Results
+
+After executing `WARM UP SELECT`, the FE dispatches tasks to each BE. The BE 
scans remote data and writes it to Data Cache.
+
+The system directly returns scan and cache write statistics for each BE (Note: 
Statistics are generally accurate but may have some margin of error). For 
example:
+
+```
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| BackendId     | ScanRows  | ScanBytes   | ScanBytesFromLocalStorage | 
ScanBytesFromRemoteStorage | BytesWriteIntoCache |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| 1755134092928 | 294744184 | 11821864798 | 538154009                 | 
11283717130                | 11899799492         |
+| 1755134092929 | 305293718 | 12244439301 | 560970435                 | 
11683475207                | 12332861380         |
+| TOTAL         | 600037902 | 24066304099 | 1099124444                | 
22967192337                | 24232660872         |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+```
+
+Field explanations:
+
+* ScanRows: Number of rows scanned and read.
+* ScanBytes: Amount of data scanned and read.
+* ScanBytesFromLocalStorage: Amount of data scanned and read from local cache.
+* ScanBytesFromRemoteStorage: Amount of data scanned and read from remote 
storage.
+* BytesWriteIntoCache: Amount of data written to Data Cache during this warmup.
+
 ## Appendix
 
 ### Principle
diff --git 
a/docs/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
 
b/docs/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
index 87f94808591..be303e7d28a 100644
--- 
a/docs/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
+++ 
b/docs/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
@@ -9,6 +9,7 @@
 
 The `WARM UP COMPUTE GROUP` statement is used to warm up data in a compute 
group to improve query performance. The warm-up operation can fetch resources 
from another compute group or specify particular tables and partitions for 
warming up. The warm-up operation returns a job ID that can be used to track 
the status of the warm-up job.
 
+> For information on how to warmup the cache for Catalog query scenarios, 
please refer to the [Data Cache 
documentation](../../../../lakehouse/data-cache.md).
 
 ## Syntax
 
@@ -55,4 +56,4 @@ warm_up_item ::= TABLE <table_name> [PARTITION 
<partition_name>];
         AND TABLE customer_info 
         AND TABLE orders PARTITION q1_2024;
 
-```
\ No newline at end of file
+```
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/optimization.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/optimization.md
index 619e899e571..2f949d67235 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/optimization.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@
 
 缓存功能默认是关闭的，请参阅 [数据缓存](../data-cache.md) 文档配置并开启。
 
+自 4.0.2 版本开始支持缓存预热功能，可以进一步主动利用数据缓存提升查询性能。
+
 ## HDFS 读取优化
 
 可参考 [HDFS 文档](../storages/hdfs.md) 中 **HDFS IO 优化** 部分。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/data-cache.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/data-cache.md
index 1785ebdd160..c54419a9f83 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/data-cache.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/data-cache.md
@@ -106,6 +106,84 @@ SET GLOBAL enable_file_cache = true;
 
 用户可以通过系统表 
[`file_cache_statistics`](../admin-manual/system-tables/information_schema/file_cache_statistics)
 查看各个 Backend 节点的缓存统计指标。
 
+## 缓存预热
+
+Data Cache 提供缓存“预热（Warmup）”功能，允许将外部数据提前加载到 BE 节点的本地缓存中，从而提升后续首次查询的命中率和查询性能。
+
+> 该功能自 4.0.2 版本支持。
+
+### 语法
+
+```sql
+WARM UP SELECT <select_expr_list>
+FROM <table_reference>
+[WHERE <boolean_expression>]
+```
+
+使用限制：
+
+* 支持：
+
+  * 单表查询（仅允许一个 table_reference）
+  * 指定列的简单 SELECT
+  * WHERE 过滤（支持常规谓词）
+
+* 不支持：
+
+  * JOIN、UNION、子查询、CTE
+  * GROUP BY、HAVING、ORDER BY
+  * LIMIT
+  * INTO OUTFILE
+  * 多表 / 复杂查询计划
+  * 其它复杂语法
+
+### 示例
+
+1. 预热整张表
+
+  ```sql
+  WARM UP SELECT * FROM hive_db.tpch100_parquet.lineitem;
+  ```
+
+2. 根据分区预热部分列
+
+  ```sql
+  WARM UP SELECT l_orderkey, l_shipmode
+  FROM hive_db.tpch100_parquet.lineitem
+  WHERE dt = '2025-01-01';
+  ```
+3. 根据过滤条件预热部分列
+
+  ```sql
+  WARM UP SELECT l_shipmode, l_linestatus
+  FROM hive_db.tpch100_parquet.lineitem
+  WHERE l_orderkey = 123456;
+  ```
+
+### 执行返回结果
+
+执行 `WARM UP SELECT` 后，FE 会下发任务至各 BE。BE 扫描远端数据并写入 Data Cache。
+
+系统会直接返回各 BE 的扫描与缓存写入统计信息（注意：统计信息基本准确，但会有一定误差）。例如：
+
+```
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| BackendId     | ScanRows  | ScanBytes   | ScanBytesFromLocalStorage | 
ScanBytesFromRemoteStorage | BytesWriteIntoCache |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| 1755134092928 | 294744184 | 11821864798 | 538154009                 | 
11283717130                | 11899799492         |
+| 1755134092929 | 305293718 | 12244439301 | 560970435                 | 
11683475207                | 12332861380         |
+| TOTAL         | 600037902 | 24066304099 | 1099124444                | 
22967192337                | 24232660872         |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+```
+
+字段解释
+
+* ScanRows：扫描读取行数。
+* ScanBytes：扫描读取数据量。
+* ScanBytesFromLocalStorage：从本地缓存扫描读取的数据量。
+* ScanBytesFromRemoteStorage：从远端存储扫描读取的数据量。
+* BytesWriteIntoCache：本次预热写入 Data Cache 的数据量。
+
 ## 附录
 
 ### 原理
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
index 1d10f5288e0..202ae67e3c0 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
@@ -9,6 +9,8 @@
 
 WARM UP COMPUTE GROUP 
语句用于预热计算组中的数据，以提高查询性能。预热操作可以从另一个计算组中获取资源，也可以指定特定的表和分区进行预热。预热操作返回一个作业 
ID，可以用于追踪预热作业的状态。
 
+> 关于如何针对 Catalog 查询场景下预热缓存，请参阅 [Data Cache 
文档](../../../../lakehouse/data-cache.md)。
+
 ## 语法
 
 ```sql
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/best-practices/optimization.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/best-practices/optimization.md
index 619e899e571..2f949d67235 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/best-practices/optimization.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@
 
 缓存功能默认是关闭的，请参阅 [数据缓存](../data-cache.md) 文档配置并开启。
 
+自 4.0.2 版本开始支持缓存预热功能，可以进一步主动利用数据缓存提升查询性能。
+
 ## HDFS 读取优化
 
 可参考 [HDFS 文档](../storages/hdfs.md) 中 **HDFS IO 优化** 部分。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/best-practices/optimization.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/best-practices/optimization.md
index 619e899e571..2f949d67235 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/best-practices/optimization.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@
 
 缓存功能默认是关闭的，请参阅 [数据缓存](../data-cache.md) 文档配置并开启。
 
+自 4.0.2 版本开始支持缓存预热功能，可以进一步主动利用数据缓存提升查询性能。
+
 ## HDFS 读取优化
 
 可参考 [HDFS 文档](../storages/hdfs.md) 中 **HDFS IO 优化** 部分。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
index 1d10f5288e0..202ae67e3c0 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
@@ -9,6 +9,8 @@
 
 WARM UP COMPUTE GROUP 
语句用于预热计算组中的数据，以提高查询性能。预热操作可以从另一个计算组中获取资源，也可以指定特定的表和分区进行预热。预热操作返回一个作业 
ID，可以用于追踪预热作业的状态。
 
+> 关于如何针对 Catalog 查询场景下预热缓存，请参阅 [Data Cache 
文档](../../../../lakehouse/data-cache.md)。
+
 ## 语法
 
 ```sql
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/best-practices/optimization.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/best-practices/optimization.md
index 619e899e571..2f949d67235 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/best-practices/optimization.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@
 
 缓存功能默认是关闭的，请参阅 [数据缓存](../data-cache.md) 文档配置并开启。
 
+自 4.0.2 版本开始支持缓存预热功能，可以进一步主动利用数据缓存提升查询性能。
+
 ## HDFS 读取优化
 
 可参考 [HDFS 文档](../storages/hdfs.md) 中 **HDFS IO 优化** 部分。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/data-cache.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/data-cache.md
index 1785ebdd160..c54419a9f83 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/data-cache.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/data-cache.md
@@ -106,6 +106,84 @@ SET GLOBAL enable_file_cache = true;
 
 用户可以通过系统表 
[`file_cache_statistics`](../admin-manual/system-tables/information_schema/file_cache_statistics)
 查看各个 Backend 节点的缓存统计指标。
 
+## 缓存预热
+
+Data Cache 提供缓存“预热（Warmup）”功能，允许将外部数据提前加载到 BE 节点的本地缓存中，从而提升后续首次查询的命中率和查询性能。
+
+> 该功能自 4.0.2 版本支持。
+
+### 语法
+
+```sql
+WARM UP SELECT <select_expr_list>
+FROM <table_reference>
+[WHERE <boolean_expression>]
+```
+
+使用限制：
+
+* 支持：
+
+  * 单表查询（仅允许一个 table_reference）
+  * 指定列的简单 SELECT
+  * WHERE 过滤（支持常规谓词）
+
+* 不支持：
+
+  * JOIN、UNION、子查询、CTE
+  * GROUP BY、HAVING、ORDER BY
+  * LIMIT
+  * INTO OUTFILE
+  * 多表 / 复杂查询计划
+  * 其它复杂语法
+
+### 示例
+
+1. 预热整张表
+
+  ```sql
+  WARM UP SELECT * FROM hive_db.tpch100_parquet.lineitem;
+  ```
+
+2. 根据分区预热部分列
+
+  ```sql
+  WARM UP SELECT l_orderkey, l_shipmode
+  FROM hive_db.tpch100_parquet.lineitem
+  WHERE dt = '2025-01-01';
+  ```
+3. 根据过滤条件预热部分列
+
+  ```sql
+  WARM UP SELECT l_shipmode, l_linestatus
+  FROM hive_db.tpch100_parquet.lineitem
+  WHERE l_orderkey = 123456;
+  ```
+
+### 执行返回结果
+
+执行 `WARM UP SELECT` 后，FE 会下发任务至各 BE。BE 扫描远端数据并写入 Data Cache。
+
+系统会直接返回各 BE 的扫描与缓存写入统计信息（注意：统计信息基本准确，但会有一定误差）。例如：
+
+```
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| BackendId     | ScanRows  | ScanBytes   | ScanBytesFromLocalStorage | 
ScanBytesFromRemoteStorage | BytesWriteIntoCache |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| 1755134092928 | 294744184 | 11821864798 | 538154009                 | 
11283717130                | 11899799492         |
+| 1755134092929 | 305293718 | 12244439301 | 560970435                 | 
11683475207                | 12332861380         |
+| TOTAL         | 600037902 | 24066304099 | 1099124444                | 
22967192337                | 24232660872         |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+```
+
+字段解释
+
+* ScanRows：扫描读取行数。
+* ScanBytes：扫描读取数据量。
+* ScanBytesFromLocalStorage：从本地缓存扫描读取的数据量。
+* ScanBytesFromRemoteStorage：从远端存储扫描读取的数据量。
+* BytesWriteIntoCache：本次预热写入 Data Cache 的数据量。
+
 ## 附录
 
 ### 原理
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
index 1d10f5288e0..202ae67e3c0 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
@@ -9,6 +9,8 @@
 
 WARM UP COMPUTE GROUP 
语句用于预热计算组中的数据，以提高查询性能。预热操作可以从另一个计算组中获取资源，也可以指定特定的表和分区进行预热。预热操作返回一个作业 
ID，可以用于追踪预热作业的状态。
 
+> 关于如何针对 Catalog 查询场景下预热缓存，请参阅 [Data Cache 
文档](../../../../lakehouse/data-cache.md)。
+
 ## 语法
 
 ```sql
diff --git 
a/versioned_docs/version-2.1/lakehouse/best-practices/optimization.md 
b/versioned_docs/version-2.1/lakehouse/best-practices/optimization.md
index 00f3dc21ec9..d52152d4d76 100644
--- a/versioned_docs/version-2.1/lakehouse/best-practices/optimization.md
+++ b/versioned_docs/version-2.1/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@ Data Cache accelerates subsequent queries accessing the same 
data by caching rec
 
 The cache feature is disabled by default. Please refer to the [Data 
Cache](../data-cache.md) documentation to configure and enable it.
 
+Since version 4.0.2, cache warmup functionality is supported, which can 
further actively utilize data cache to improve query performance.
+
 ## HDFS Read Optimization
 
 Please refer to the **HDFS IO Optimization** section in the [HDFS 
Documentation](../storages/hdfs.md).
diff --git 
a/versioned_docs/version-3.x/lakehouse/best-practices/optimization.md 
b/versioned_docs/version-3.x/lakehouse/best-practices/optimization.md
index 00f3dc21ec9..d52152d4d76 100644
--- a/versioned_docs/version-3.x/lakehouse/best-practices/optimization.md
+++ b/versioned_docs/version-3.x/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@ Data Cache accelerates subsequent queries accessing the same 
data by caching rec
 
 The cache feature is disabled by default. Please refer to the [Data 
Cache](../data-cache.md) documentation to configure and enable it.
 
+Since version 4.0.2, cache warmup functionality is supported, which can 
further actively utilize data cache to improve query performance.
+
 ## HDFS Read Optimization
 
 Please refer to the **HDFS IO Optimization** section in the [HDFS 
Documentation](../storages/hdfs.md).
diff --git 
a/versioned_docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
 
b/versioned_docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
index 87f94808591..be303e7d28a 100644
--- 
a/versioned_docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
+++ 
b/versioned_docs/version-3.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
@@ -9,6 +9,7 @@
 
 The `WARM UP COMPUTE GROUP` statement is used to warm up data in a compute 
group to improve query performance. The warm-up operation can fetch resources 
from another compute group or specify particular tables and partitions for 
warming up. The warm-up operation returns a job ID that can be used to track 
the status of the warm-up job.
 
+> For information on how to warmup the cache for Catalog query scenarios, 
please refer to the [Data Cache 
documentation](../../../../lakehouse/data-cache.md).
 
 ## Syntax
 
@@ -55,4 +56,4 @@ warm_up_item ::= TABLE <table_name> [PARTITION 
<partition_name>];
         AND TABLE customer_info 
         AND TABLE orders PARTITION q1_2024;
 
-```
\ No newline at end of file
+```
diff --git 
a/versioned_docs/version-4.x/lakehouse/best-practices/optimization.md 
b/versioned_docs/version-4.x/lakehouse/best-practices/optimization.md
index 00f3dc21ec9..d52152d4d76 100644
--- a/versioned_docs/version-4.x/lakehouse/best-practices/optimization.md
+++ b/versioned_docs/version-4.x/lakehouse/best-practices/optimization.md
@@ -29,6 +29,8 @@ Data Cache accelerates subsequent queries accessing the same 
data by caching rec
 
 The cache feature is disabled by default. Please refer to the [Data 
Cache](../data-cache.md) documentation to configure and enable it.
 
+Since version 4.0.2, cache warmup functionality is supported, which can 
further actively utilize data cache to improve query performance.
+
 ## HDFS Read Optimization
 
 Please refer to the **HDFS IO Optimization** section in the [HDFS 
Documentation](../storages/hdfs.md).
diff --git a/versioned_docs/version-4.x/lakehouse/data-cache.md 
b/versioned_docs/version-4.x/lakehouse/data-cache.md
index 7fbeb114bbb..f2c937baee8 100644
--- a/versioned_docs/version-4.x/lakehouse/data-cache.md
+++ b/versioned_docs/version-4.x/lakehouse/data-cache.md
@@ -106,6 +106,84 @@ If `BytesScannedFromRemote` is 0, it means the cache is 
fully hit.
 
 Users can view cache statistics for each Backend node through the system table 
[`file_cache_statistics`](../admin-manual/system-tables/information_schema/file_cache_statistics).
 
+## Cache Warmup
+
+Data Cache provides a cache "warmup" feature that allows preloading external 
data into the local cache of BE nodes, thereby improving cache hit rates and 
query performance for subsequent first-time queries.
+
+> This feature is supported since version 4.0.2.
+
+### Syntax
+
+```sql
+WARM UP SELECT <select_expr_list>
+FROM <table_reference>
+[WHERE <boolean_expression>]
+```
+
+Usage restrictions:
+
+* Supported:
+
+  * Single table queries (only one table_reference allowed)
+  * Simple SELECT for specified columns
+  * WHERE filtering (supports regular predicates)
+
+* Not supported:
+
+  * JOIN, UNION, subqueries, CTE
+  * GROUP BY, HAVING, ORDER BY
+  * LIMIT
+  * INTO OUTFILE
+  * Multi-table / complex query plans
+  * Other complex syntax
+
+### Examples
+
+1. Warm up the entire table
+
+  ```sql
+  WARM UP SELECT * FROM hive_db.tpch100_parquet.lineitem;
+  ```
+
+2. Warm up partial columns by partition
+
+  ```sql
+  WARM UP SELECT l_orderkey, l_shipmode
+  FROM hive_db.tpch100_parquet.lineitem
+  WHERE dt = '2025-01-01';
+  ```
+3. Warm up partial columns by filter conditions
+
+  ```sql
+  WARM UP SELECT l_shipmode, l_linestatus
+  FROM hive_db.tpch100_parquet.lineitem
+  WHERE l_orderkey = 123456;
+  ```
+
+### Execution Results
+
+After executing `WARM UP SELECT`, the FE dispatches tasks to each BE. The BE 
scans remote data and writes it to Data Cache.
+
+The system directly returns scan and cache write statistics for each BE (Note: 
Statistics are generally accurate but may have some margin of error). For 
example:
+
+```
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| BackendId     | ScanRows  | ScanBytes   | ScanBytesFromLocalStorage | 
ScanBytesFromRemoteStorage | BytesWriteIntoCache |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+| 1755134092928 | 294744184 | 11821864798 | 538154009                 | 
11283717130                | 11899799492         |
+| 1755134092929 | 305293718 | 12244439301 | 560970435                 | 
11683475207                | 12332861380         |
+| TOTAL         | 600037902 | 24066304099 | 1099124444                | 
22967192337                | 24232660872         |
++---------------+-----------+-------------+---------------------------+----------------------------+---------------------+
+```
+
+Field explanations:
+
+* ScanRows: Number of rows scanned and read.
+* ScanBytes: Amount of data scanned and read.
+* ScanBytesFromLocalStorage: Amount of data scanned and read from local cache.
+* ScanBytesFromRemoteStorage: Amount of data scanned and read from remote 
storage.
+* BytesWriteIntoCache: Amount of data written to Data Cache during this warmup.
+
 ## Appendix
 
 ### Principle
diff --git 
a/versioned_docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
 
b/versioned_docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
index 87f94808591..be303e7d28a 100644
--- 
a/versioned_docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
+++ 
b/versioned_docs/version-4.x/sql-manual/sql-statements/cluster-management/storage-management/WARM-UP.md
@@ -9,6 +9,7 @@
 
 The `WARM UP COMPUTE GROUP` statement is used to warm up data in a compute 
group to improve query performance. The warm-up operation can fetch resources 
from another compute group or specify particular tables and partitions for 
warming up. The warm-up operation returns a job ID that can be used to track 
the status of the warm-up job.
 
+> For information on how to warmup the cache for Catalog query scenarios, 
please refer to the [Data Cache 
documentation](../../../../lakehouse/data-cache.md).
 
 ## Syntax
 
@@ -55,4 +56,4 @@ warm_up_item ::= TABLE <table_name> [PARTITION 
<partition_name>];
         AND TABLE customer_info 
         AND TABLE orders PARTITION q1_2024;
 
-```
\ No newline at end of file
+```


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [docs](data-cache) Add data-cache-warmup documents. (#3165)

Reply via email to