This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new af0963361f6 [opt](paimon) add paimon dlf 2.5 (#2738)
af0963361f6 is described below
commit af0963361f6cc697ea34fbb1892cee85fdbf9164
Author: Mingyu Chen (Rayner) <[email protected]>
AuthorDate: Tue Aug 12 14:50:29 2025 -0700
[opt](paimon) add paimon dlf 2.5 (#2738)
## Versions
- [x] dev
- [x] 3.0
- [x] 2.1
- [ ] 2.0
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
.../lakehouse/best-practices/doris-aws-s3tables.md | 2 +-
docs/lakehouse/best-practices/doris-dlf-paimon.md | 150 +++++++++++++++++++++
docs/lakehouse/catalogs/paimon-catalog.md | 27 ++--
.../lakehouse/best-practices/doris-dlf-paimon.md | 150 +++++++++++++++++++++
.../current/lakehouse/catalogs/paimon-catalog.md | 28 ++--
.../lakehouse/best-practices/doris-dlf-paimon.md | 150 +++++++++++++++++++++
.../lakehouse/catalogs/paimon-catalog.md | 28 ++--
.../lakehouse/best-practices/doris-dlf-paimon.md | 150 +++++++++++++++++++++
.../lakehouse/catalogs/paimon-catalog.md | 28 ++--
sidebars.json | 7 +-
.../lakehouse/best-practices/doris-aws-s3tables.md | 2 +-
.../lakehouse/best-practices/doris-dlf-paimon.md | 150 +++++++++++++++++++++
.../lakehouse/catalogs/paimon-catalog.md | 27 ++--
.../lakehouse/best-practices/doris-aws-s3tables.md | 2 +-
.../lakehouse/best-practices/doris-dlf-paimon.md | 150 +++++++++++++++++++++
.../lakehouse/catalogs/paimon-catalog.md | 27 ++--
versioned_sidebars/version-2.1-sidebars.json | 5 +-
versioned_sidebars/version-3.0-sidebars.json | 5 +-
18 files changed, 985 insertions(+), 103 deletions(-)
diff --git a/docs/lakehouse/best-practices/doris-aws-s3tables.md
b/docs/lakehouse/best-practices/doris-aws-s3tables.md
index cb16b6f8bc3..1c349fe19b5 100644
--- a/docs/lakehouse/best-practices/doris-aws-s3tables.md
+++ b/docs/lakehouse/best-practices/doris-aws-s3tables.md
@@ -16,7 +16,7 @@ The release of S3 Tables further simplifies Lakehouse
architecture and brings mo
Thanks to Amazon S3 Tables' high compatibility with the Iceberg API, Apache
Doris can quickly integrate with S3 Tables. This article will demonstrate how
to connect Apache Doris with S3 Tables and perform data analysis and processing.
:::tip
-This feature is supported from Doris 3.1 onwards
+This feature is supported since Doris 3.1
:::
## Usage Guide
diff --git a/docs/lakehouse/best-practices/doris-dlf-paimon.md
b/docs/lakehouse/best-practices/doris-dlf-paimon.md
new file mode 100644
index 00000000000..91fcf3f4623
--- /dev/null
+++ b/docs/lakehouse/best-practices/doris-dlf-paimon.md
@@ -0,0 +1,150 @@
+---
+{
+ "title": "Integration with Alibaba DLF",
+ "language": "en"
+}
+
+---
+
+Alibaba Cloud [Data Lake Formation
(DLF)](https://www.alibabacloud.com/en/product/datalake-formation) serves as a
core component of cloud-native data lake architecture, helping users quickly
build cloud-native data lake architectures. Data Lake Formation provides
unified metadata management on the lake, enterprise-level permission control,
and seamlessly integrates with multiple computing engines to break data silos
and uncover business value.
+
+- Unified Metadata and Storage
+
+ Computing engines share a unified set of lake metadata and storage,
enabling data flow between lake ecosystem products.
+
+- Unified Permission Management
+
+ Computing engines share a unified set of lake table permission
configurations, achieving one-time configuration with multi-location
effectiveness.
+
+- Storage Optimization
+
+ Provides optimization strategies including small file merging, expired
snapshot cleanup, partition organization, and obsolete file cleanup to improve
storage efficiency.
+
+- Comprehensive Cloud Ecosystem Support
+
+ Deep integration with Alibaba Cloud products, including streaming and
batch computing engines, enabling out-of-the-box functionality and enhancing
user experience and operational convenience.
+
+Starting from DLF version 2.5, Paimon Rest Catalog is supported. Doris,
beginning from version 3.1.0, supports integration with DLF 2.5+ Paimon Rest
Catalog, enabling seamless connection to DLF for accessing and analyzing Paimon
table data. This document demonstrates how to use Apache Doris to connect to
DLF 2.5+ and access Paimon table data.
+
+:::tip
+This feature is supported since Doris 3.1
+:::
+
+## Usage Guide
+
+### 01 Enable DLF Service
+
+Please refer to the DLF official documentation to enable the DLF service and
create corresponding Catalog, Database, and Table.
+
+### 02 Access DLF Using EMR Spark SQL
+
+- Connection
+
+ ```sql
+ spark-sql --master yarn \
+ --conf spark.driver.memory=5g \
+ --conf spark.sql.defaultCatalog=paimon \
+ --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
+ --conf spark.sql.catalog.paimon.metastore=rest \
+ --conf
spark.sql.extensions=org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions
\
+ --conf
spark.sql.catalog.paimon.uri=http://<region>-vpc.dlf.aliyuncs.com \
+ --conf spark.sql.catalog.paimon.warehouse=<your-catalog-name> \
+ --conf spark.sql.catalog.paimon.token.provider=dlf \
+ --conf spark.sql.catalog.paimon.dlf.token-loader=ecs
+ ```
+
+ > Replace the corresponding `warehouse` and `uri` address.
+
+- Write Data
+
+ ```sql
+ USE <your-catalog-name>;
+
+ CREATE TABLE users_samples
+ (
+ user_id INT,
+ age_level STRING,
+ final_gender_code STRING,
+ clk BOOLEAN
+ );
+
+ INSERT INTO users_samples VALUES
+ (1, '25-34', 'M', true),
+ (2, '18-24', 'F', false);
+
+ INSERT INTO users_samples VALUES
+ (3, '25-34', 'M', true),
+ (4, '18-24', 'F', false);
+
+ INSERT INTO users_samples VALUES
+ (5, '25-34', 'M', true),
+ (6, '18-24', 'F', false);
+ ```
+
+ If you encounter the following error, please try removing
`paimon-jindo-x.y.z.jar` from `/opt/apps/PAIMON/paimon-dlf-2.5/lib/spark3` and
restart the Spark service before retrying.
+
+ ```
+ Ambiguous FileIO classes are:
+ org.apache.paimon.jindo.JindoLoader
+ org.apache.paimon.oss.OSSLoader
+ ```
+
+### 03 Connect Doris to DLF
+
+- Create Paimon Catalog
+
+ ```sql
+ CREATE CATALOG paimon_dlf_test PROPERTIES (
+ 'type' = 'paimon',
+ 'paimon.catalog.type' = 'rest',
+ 'uri' = 'http://<region>-vpc.dlf.aliyuncs.com',
+ 'warehouse' = '<your-catalog-name>',
+ 'paimon.rest.token.provider' = 'dlf',
+ 'paimon.rest.dlf.access-key-id' = '<ak>',
+ 'paimon.rest.dlf.access-key-secret' = '<sk>'
+ );
+ ```
+
+ - Doris will use temporary credentials returned by DLF to access OSS
object storage, without requiring additional OSS credential information.
+ - Only supports accessing DLF within the same VPC, ensure you provide the
correct uri address.
+
+- Query Data
+
+ ```sql
+ SELECT * FROM users_samples ORDER BY user_id;
+ +---------+-----------+-------------------+------+
+ | user_id | age_level | final_gender_code | clk |
+ +---------+-----------+-------------------+------+
+ | 1 | 25-34 | M | 1 |
+ | 2 | 18-24 | F | 0 |
+ | 3 | 25-34 | M | 1 |
+ | 4 | 18-24 | F | 0 |
+ | 5 | 25-34 | M | 1 |
+ | 6 | 18-24 | F | 0 |
+ +---------+-----------+-------------------+------+
+ ```
+
+- Query System Tables
+
+ ```sql
+ SELECT snapshot_id, commit_time, total_record_count FROM
users_samples$snapshots;
+ +-------------+-------------------------+--------------------+
+ | snapshot_id | commit_time | total_record_count |
+ +-------------+-------------------------+--------------------+
+ | 1 | 2025-08-09 05:56:02.906 | 2 |
+ | 2 | 2025-08-13 03:41:32.732 | 4 |
+ | 3 | 2025-08-13 03:41:35.218 | 6 |
+ +-------------+-------------------------+--------------------+
+ ```
+
+- Batch Incremental Reading
+
+ ```sql
+ SELECT * FROM users_samples@incr('startSnapshotId'=1, 'endSnapshotId'=2)
ORDER BY user_id;
+ +---------+-----------+-------------------+------+
+ | user_id | age_level | final_gender_code | clk |
+ +---------+-----------+-------------------+------+
+ | 3 | 25-34 | M | 1 |
+ | 4 | 18-24 | F | 0 |
+ +---------+-----------+-------------------+------+
+ ```
diff --git a/docs/lakehouse/catalogs/paimon-catalog.md
b/docs/lakehouse/catalogs/paimon-catalog.md
index ce9253d81f7..f2df00afdb1 100644
--- a/docs/lakehouse/catalogs/paimon-catalog.md
+++ b/docs/lakehouse/catalogs/paimon-catalog.md
@@ -246,9 +246,6 @@ Supports [Batch
Incremental](https://paimon.apache.org/docs/master/flink/sql-que
Supports querying incremental data within specified snapshot or timestamp
intervals. The interval is left-closed and right-open.
```sql
--- read from snapshot 2
-SELECT * FROM paimon_table@incr('startSnapshotId'='2');
-
-- between snapshots [0, 5)
SELECT * FROM paimon_table@incr('startSnapshotId'='0', 'endSnapshotId'='5');
@@ -256,29 +253,29 @@ SELECT * FROM paimon_table@incr('startSnapshotId'='0',
'endSnapshotId'='5');
SELECT * FROM paimon_table@incr('startSnapshotId'='0', 'endSnapshotId'='5',
'incrementalBetweenScanMode'='diff');
-- read from start timestamp
-SELECT * FROM paimon_table@incr('startTimestamp'='1750844949');
+SELECT * FROM paimon_table@incr('startTimestamp'='1750844949000');
-- read between timestamp
-SELECT * FROM paimon_table@incr('startTimestamp'='1750844949',
'endTimestamp'='1750944949');
+SELECT * FROM paimon_table@incr('startTimestamp'='1750844949000',
'endTimestamp'='1750944949000');
```
Parameter:
| Parameter | Description | Example |
| --- | --- | -- |
-| `startSnapshotId` | Starting snapshot ID, must be greater than 0 |
`'startSnapshotId'='3'` |
-| `endSnapshotId` | Ending snapshot ID, must be greater than
`startSnapshotId`. Optional, if not specified, reads from `startSnapshotId` to
the latest snapshot | `'endSnapshotId'='10'` |
+| `startSnapshotId` | Starting snapshot ID, must be greater than 0. Must be
specified with `endSnapshotId` together. | `'startSnapshotId'='3'` |
+| `endSnapshotId` | Ending snapshot ID, must be greater than
`startSnapshotId`. Must be specified with `startSnapshotId` together. |
`'endSnapshotId'='10'` |
| `incrementalBetweenScanMode` | Specifies the incremental read mode, default
is `auto`, supports `delta`, `changelog` and `diff` |
`'incrementalBetweenScanMode'='delta'` |
-| `startTimestamp` | Starting snapshot timestamp, must be greater than or
equal to 0 | `'startTimestamp'='1750844949'` |
-| `endTimestamp` | Ending snapshot timestamp, must be greater than
`startTimestamp`. Optional, if not specified, reads from `startTimestamp` to
the latest snapshot | `'endTimestamp'='1750944949'` |
+| `startTimestamp` | Starting snapshot timestamp, must be greater than or
equal to 0. Unit is millisecond. | `'startTimestamp'='1750844949000'` |
+| `endTimestamp` | Ending snapshot timestamp, must be greater than
`startTimestamp`. Optional, if not specified, reads from `startTimestamp` to
the latest snapshot. Unit is millisecond. | `'endTimestamp'='1750944949000'` |
> Notice:
-
-> - `startSnapshotId` and `endSnapshotId` will compose the Paimon parameter
`'incremental-between'='3,10'`
-
-> - `startTimestamp` and `endTimestamp` will compose the Paimon parameter
`'incremental-between-timestamp'='1750844949,1750944949'`
-
-> - `incrementalBetweenScanMode` corresponds to the Paimon parameter
`incremental-between-scan-mode`.
+>
+> `startSnapshotId` and `endSnapshotId` will compose the Paimon parameter
`'incremental-between'='3,10'`
+>
+> `startTimestamp` and `endTimestamp` will compose the Paimon parameter
`'incremental-between-timestamp'='1750844949000,1750944949000'`
+>
+> `incrementalBetweenScanMode` corresponds to the Paimon parameter
`incremental-between-scan-mode`.
Refer to the [Paimon
documentation](https://paimon.apache.org/docs/master/maintenance/configurations/)
for further details about these parameters.
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/doris-dlf-paimon.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/doris-dlf-paimon.md
new file mode 100644
index 00000000000..ee2e9753643
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/doris-dlf-paimon.md
@@ -0,0 +1,150 @@
+---
+{
+ "title": "Doris 集成阿里云 DLF",
+ "language": "zh-CN"
+}
+
+---
+
+阿里云数据湖构建 [Data Lake Formation,DLF](https://cn.aliyun.com/product/bigdata/dlf)
作为云原生数据湖架构核心组成部分,帮助用户快速地构建云原生数据湖架构。数据湖构建提供湖上元数据统一管理、企业级权限控制,并无缝对接多种计算引擎,打破数据孤岛,洞察业务价值。
+
+- 统一元数据与存储
+
+ 大数据计算引擎共享一套湖上元数据和存储,且数据可在环湖产品间流动。
+
+- 统一权限管理
+
+ 大数据计算引擎共享一套湖表权限配置,实现一次配置,多处生效。
+
+- 存储优化
+
+ 提供小文件合并、过期快照清理、分区整理及废弃文件清理等优化策略,提升存储效率。
+
+- 完善的云生态支持体系
+
+ 深度整合阿里云产品,包括流批计算引擎,实现开箱即用,提升用户体验与操作便捷性。
+
+DLF 2.5 版本开始支持 Paimon Rest Catalog。Doris 自 3.1.0 版本开始,支持集成 DLF 2.5+ 版本的 Paimon
Rest Catalog,可以无缝对接 DLF,访问并分析 Paimon 表数据。本文将演示如何使用 Apache Doris 对接 DLF 2.5+
版本并进行 Paimon 表数据访问。
+
+:::tip
+该功能从 Doris 3.1 开始支持
+:::
+
+## 使用指南
+
+### 01 开通 DLF 服务
+
+请参考 DLF 官方文档开通 DLF 服务,并创建相应的 Catalog、Database 和 Table。
+
+### 02 使用 EMR Spark SQL 访问 DLF
+
+- 连接
+
+ ```sql
+ spark-sql --master yarn \
+ --conf spark.driver.memory=5g \
+ --conf spark.sql.defaultCatalog=paimon \
+ --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
+ --conf spark.sql.catalog.paimon.metastore=rest \
+ --conf
spark.sql.extensions=org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions
\
+ --conf
spark.sql.catalog.paimon.uri=http://<region>-vpc.dlf.aliyuncs.com \
+ --conf spark.sql.catalog.paimon.warehouse=<your-catalog-name> \
+ --conf spark.sql.catalog.paimon.token.provider=dlf \
+ --conf spark.sql.catalog.paimon.dlf.token-loader=ecs
+ ```
+
+ > 替换对应的 `warehouse` 和 `uri` 地址。
+
+- 写入数据
+
+ ```sql
+ USE <your-catalog-name>;
+
+ CREATE TABLE users_samples
+ (
+ user_id INT,
+ age_level STRING,
+ final_gender_code STRING,
+ clk BOOLEAN
+ );
+
+ INSERT INTO users_samples VALUES
+ (1, '25-34', 'M', true),
+ (2, '18-24', 'F', false);
+
+ INSERT INTO users_samples VALUES
+ (3, '25-34', 'M', true),
+ (4, '18-24', 'F', false);
+
+ INSERT INTO users_samples VALUES
+ (5, '25-34', 'M', true),
+ (6, '18-24', 'F', false);
+ ```
+
+ 如遇到以下错误,请尝试移除 `/opt/apps/PAIMON/paimon-dlf-2.5/lib/spark3` 下的
`paimon-jindo-x.y.z.jar` 后重启 Spark 服务并重试。
+
+ ```
+ Ambiguous FileIO classes are:
+ org.apache.paimon.jindo.JindoLoader
+ org.apache.paimon.oss.OSSLoader
+ ```
+
+### 03 使用 Doris 链接 DLF
+
+- 创建 Paimon Catalog
+
+ ```sql
+ CREATE CATALOG paimon_dlf_test PROPERTIES (
+ 'type' = 'paimon',
+ 'paimon.catalog.type' = 'rest',
+ 'uri' = 'http://<region>-vpc.dlf.aliyuncs.com',
+ 'warehouse' = '<your-catalog-name>',
+ 'paimon.rest.token.provider' = 'dlf',
+ 'paimon.rest.dlf.access-key-id' = '<ak>',
+ 'paimon.rest.dlf.access-key-secret' = '<sk>'
+ );
+ ```
+
+ - Doris 会使用 DLF 返回的临时凭证访问 OSS 对象存储,不需要额外提供 OSS 的凭证信息。
+ - 仅支持在同 VPC 内访问 DLF,注意提供正确的 uri 地址。
+
+- 查询数据
+
+ ```sql
+ SELECT * FROM users_samples ORDER BY user_id;
+ +---------+-----------+-------------------+------+
+ | user_id | age_level | final_gender_code | clk |
+ +---------+-----------+-------------------+------+
+ | 1 | 25-34 | M | 1 |
+ | 2 | 18-24 | F | 0 |
+ | 3 | 25-34 | M | 1 |
+ | 4 | 18-24 | F | 0 |
+ | 5 | 25-34 | M | 1 |
+ | 6 | 18-24 | F | 0 |
+ +---------+-----------+-------------------+------+
+ ```
+
+- 查询系统表
+
+ ```sql
+ SELECT snapshot_id, commit_time, total_record_count FROM
users_samples$snapshots;
+ +-------------+-------------------------+--------------------+
+ | snapshot_id | commit_time | total_record_count |
+ +-------------+-------------------------+--------------------+
+ | 1 | 2025-08-09 05:56:02.906 | 2 |
+ | 2 | 2025-08-13 03:41:32.732 | 4 |
+ | 3 | 2025-08-13 03:41:35.218 | 6 |
+ +-------------+-------------------------+--------------------+
+ ```
+
+- 增量读取
+
+ ```sql
+ SELECT * FROM users_samples@incr('startSnapshotId'=1, 'endSnapshotId'=2)
ORDER BY user_id;
+ +---------+-----------+-------------------+------+
+ | user_id | age_level | final_gender_code | clk |
+ +---------+-----------+-------------------+------+
+ | 3 | 25-34 | M | 1 |
+ | 4 | 18-24 | F | 0 |
+ +---------+-----------+-------------------+------+
+ ```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/paimon-catalog.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/paimon-catalog.md
index cc6f194e6e3..07da5d94d99 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/paimon-catalog.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/paimon-catalog.md
@@ -246,9 +246,6 @@ SELECT * FROM paimon_ctl.paimon_db.paimon_tbl LIMIT 10;
支持查询指定的快照或时间戳区间内的增量数据。区间为左闭右开区间。
```sql
--- read from snapshot 2
-SELECT * FROM paimon_table@incr('startSnapshotId'='2');
-
-- between snapshots [0, 5)
SELECT * FROM paimon_table@incr('startSnapshotId'='0', 'endSnapshotId'='5');
@@ -256,33 +253,32 @@ SELECT * FROM paimon_table@incr('startSnapshotId'='0',
'endSnapshotId'='5');
SELECT * FROM paimon_table@incr('startSnapshotId'='0', 'endSnapshotId'='5',
'incrementalBetweenScanMode'='diff');
-- read from start timestamp
-SELECT * FROM paimon_table@incr('startTimestamp'='1750844949');
+SELECT * FROM paimon_table@incr('startTimestamp'='1750844949000');
-- read between timestamp
-SELECT * FROM paimon_table@incr('startTimestamp'='1750844949',
'endTimestamp'='1750944949');
+SELECT * FROM paimon_table@incr('startTimestamp'='1750844949000',
'endTimestamp'='1750944949000');
```
参数说明:
| 参数 | 说明 | 示例 |
| --- | --- | -- |
-| `startSnapshotId` | 起始快照 ID,必须大于 0 | `'startSnapshotId'='3'` |
-| `endSnapshotId` | 结束快照 ID,必须大于 `startSnapshotId`。可选,如不指定,则表示从
`startSnapshotId` 开始读取到最新的快照 | `'endSnapshotId'='10'` |
+| `startSnapshotId` | 起始快照 ID,必须大于 0。必须和 `endSnapshotId` 配对使用。 |
`'startSnapshotId'='3'` |
+| `endSnapshotId` | 结束快照 ID,必须大于 `startSnapshotId`。必须和 `startSnapshotId` 配对使用。
| `'endSnapshotId'='10'` |
| `incrementalBetweenScanMode` | 指定增量读取的模式,默认 `auto`,支持 `delta`, `changelog` 和
`diff` | `'incrementalBetweenScanMode'='delta'` |
-| `startTimestamp` | 起始快照时间,必须大于等于 0 | `'startTimestamp'='1750844949'` |
-| `endTimestamp` | 结束快照时间,必须大于 `startTimestamp`。可选,如不指定,则表示从 `startTimestamp`
开始读取到最新的快照 | `'endTimestamp'='1750944949'` |
+| `startTimestamp` | 起始快照时间,必须大于等于 0。单位是毫秒。 |
`'startTimestamp'='1750844949000'` |
+| `endTimestamp` | 结束快照时间,必须大于 `startTimestamp`。可选,如不指定,则表示从 `startTimestamp`
开始读取到最新的快照。单位是毫秒。 | `'endTimestamp'='1750944949000'` |
> 注:
-
-> - `startSnapshotId` 和 `endSnapshotId` 会组成 Paimon 参数
`'incremental-between'='3,10'`
-
-> - `startTimestamp` 和 `endTimestamp` 会组成 Paimon 参数
`'incremental-between-timestamp'='1750844949,1750944949'`
-
-> - `incrementalBetweenScanMode` 对应 Paimon 参数 `incremental-between-scan-mode`。
+>
+> `startSnapshotId` 和 `endSnapshotId` 会组成 Paimon 参数
`'incremental-between'='3,10'`
+>
+> `startTimestamp` 和 `endTimestamp` 会组成 Paimon 参数
`'incremental-between-timestamp'='1750844949000,1750944949000'`
+>
+> `incrementalBetweenScanMode` 对应 Paimon 参数 `incremental-between-scan-mode`。
可参阅 [Paimon
文档](https://paimon.apache.org/docs/master/maintenance/configurations/)
进一步了解这些参数。
-
## 系统表
> 该功能自 3.1.0 版本支持
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/best-practices/doris-dlf-paimon.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/best-practices/doris-dlf-paimon.md
new file mode 100644
index 00000000000..ee2e9753643
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/best-practices/doris-dlf-paimon.md
@@ -0,0 +1,150 @@
+---
+{
+ "title": "Doris 集成阿里云 DLF",
+ "language": "zh-CN"
+}
+
+---
+
+阿里云数据湖构建 [Data Lake Formation,DLF](https://cn.aliyun.com/product/bigdata/dlf)
作为云原生数据湖架构核心组成部分,帮助用户快速地构建云原生数据湖架构。数据湖构建提供湖上元数据统一管理、企业级权限控制,并无缝对接多种计算引擎,打破数据孤岛,洞察业务价值。
+
+- 统一元数据与存储
+
+ 大数据计算引擎共享一套湖上元数据和存储,且数据可在环湖产品间流动。
+
+- 统一权限管理
+
+ 大数据计算引擎共享一套湖表权限配置,实现一次配置,多处生效。
+
+- 存储优化
+
+ 提供小文件合并、过期快照清理、分区整理及废弃文件清理等优化策略,提升存储效率。
+
+- 完善的云生态支持体系
+
+ 深度整合阿里云产品,包括流批计算引擎,实现开箱即用,提升用户体验与操作便捷性。
+
+DLF 2.5 版本开始支持 Paimon Rest Catalog。Doris 自 3.1.0 版本开始,支持集成 DLF 2.5+ 版本的 Paimon
Rest Catalog,可以无缝对接 DLF,访问并分析 Paimon 表数据。本文将演示如何使用 Apache Doris 对接 DLF 2.5+
版本并进行 Paimon 表数据访问。
+
+:::tip
+该功能从 Doris 3.1 开始支持
+:::
+
+## 使用指南
+
+### 01 开通 DLF 服务
+
+请参考 DLF 官方文档开通 DLF 服务,并创建相应的 Catalog、Database 和 Table。
+
+### 02 使用 EMR Spark SQL 访问 DLF
+
+- 连接
+
+ ```sql
+ spark-sql --master yarn \
+ --conf spark.driver.memory=5g \
+ --conf spark.sql.defaultCatalog=paimon \
+ --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
+ --conf spark.sql.catalog.paimon.metastore=rest \
+ --conf
spark.sql.extensions=org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions
\
+ --conf
spark.sql.catalog.paimon.uri=http://<region>-vpc.dlf.aliyuncs.com \
+ --conf spark.sql.catalog.paimon.warehouse=<your-catalog-name> \
+ --conf spark.sql.catalog.paimon.token.provider=dlf \
+ --conf spark.sql.catalog.paimon.dlf.token-loader=ecs
+ ```
+
+ > 替换对应的 `warehouse` 和 `uri` 地址。
+
+- 写入数据
+
+ ```sql
+ USE <your-catalog-name>;
+
+ CREATE TABLE users_samples
+ (
+ user_id INT,
+ age_level STRING,
+ final_gender_code STRING,
+ clk BOOLEAN
+ );
+
+ INSERT INTO users_samples VALUES
+ (1, '25-34', 'M', true),
+ (2, '18-24', 'F', false);
+
+ INSERT INTO users_samples VALUES
+ (3, '25-34', 'M', true),
+ (4, '18-24', 'F', false);
+
+ INSERT INTO users_samples VALUES
+ (5, '25-34', 'M', true),
+ (6, '18-24', 'F', false);
+ ```
+
+ 如遇到以下错误,请尝试移除 `/opt/apps/PAIMON/paimon-dlf-2.5/lib/spark3` 下的
`paimon-jindo-x.y.z.jar` 后重启 Spark 服务并重试。
+
+ ```
+ Ambiguous FileIO classes are:
+ org.apache.paimon.jindo.JindoLoader
+ org.apache.paimon.oss.OSSLoader
+ ```
+
+### 03 使用 Doris 链接 DLF
+
+- 创建 Paimon Catalog
+
+ ```sql
+ CREATE CATALOG paimon_dlf_test PROPERTIES (
+ 'type' = 'paimon',
+ 'paimon.catalog.type' = 'rest',
+ 'uri' = 'http://<region>-vpc.dlf.aliyuncs.com',
+ 'warehouse' = '<your-catalog-name>',
+ 'paimon.rest.token.provider' = 'dlf',
+ 'paimon.rest.dlf.access-key-id' = '<ak>',
+ 'paimon.rest.dlf.access-key-secret' = '<sk>'
+ );
+ ```
+
+ - Doris 会使用 DLF 返回的临时凭证访问 OSS 对象存储,不需要额外提供 OSS 的凭证信息。
+ - 仅支持在同 VPC 内访问 DLF,注意提供正确的 uri 地址。
+
+- 查询数据
+
+ ```sql
+ SELECT * FROM users_samples ORDER BY user_id;
+ +---------+-----------+-------------------+------+
+ | user_id | age_level | final_gender_code | clk |
+ +---------+-----------+-------------------+------+
+ | 1 | 25-34 | M | 1 |
+ | 2 | 18-24 | F | 0 |
+ | 3 | 25-34 | M | 1 |
+ | 4 | 18-24 | F | 0 |
+ | 5 | 25-34 | M | 1 |
+ | 6 | 18-24 | F | 0 |
+ +---------+-----------+-------------------+------+
+ ```
+
+- 查询系统表
+
+ ```sql
+ SELECT snapshot_id, commit_time, total_record_count FROM
users_samples$snapshots;
+ +-------------+-------------------------+--------------------+
+ | snapshot_id | commit_time | total_record_count |
+ +-------------+-------------------------+--------------------+
+ | 1 | 2025-08-09 05:56:02.906 | 2 |
+ | 2 | 2025-08-13 03:41:32.732 | 4 |
+ | 3 | 2025-08-13 03:41:35.218 | 6 |
+ +-------------+-------------------------+--------------------+
+ ```
+
+- 增量读取
+
+ ```sql
+ SELECT * FROM users_samples@incr('startSnapshotId'=1, 'endSnapshotId'=2)
ORDER BY user_id;
+ +---------+-----------+-------------------+------+
+ | user_id | age_level | final_gender_code | clk |
+ +---------+-----------+-------------------+------+
+ | 3 | 25-34 | M | 1 |
+ | 4 | 18-24 | F | 0 |
+ +---------+-----------+-------------------+------+
+ ```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/catalogs/paimon-catalog.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/catalogs/paimon-catalog.md
index cc6f194e6e3..07da5d94d99 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/catalogs/paimon-catalog.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/catalogs/paimon-catalog.md
@@ -246,9 +246,6 @@ SELECT * FROM paimon_ctl.paimon_db.paimon_tbl LIMIT 10;
支持查询指定的快照或时间戳区间内的增量数据。区间为左闭右开区间。
```sql
--- read from snapshot 2
-SELECT * FROM paimon_table@incr('startSnapshotId'='2');
-
-- between snapshots [0, 5)
SELECT * FROM paimon_table@incr('startSnapshotId'='0', 'endSnapshotId'='5');
@@ -256,33 +253,32 @@ SELECT * FROM paimon_table@incr('startSnapshotId'='0',
'endSnapshotId'='5');
SELECT * FROM paimon_table@incr('startSnapshotId'='0', 'endSnapshotId'='5',
'incrementalBetweenScanMode'='diff');
-- read from start timestamp
-SELECT * FROM paimon_table@incr('startTimestamp'='1750844949');
+SELECT * FROM paimon_table@incr('startTimestamp'='1750844949000');
-- read between timestamp
-SELECT * FROM paimon_table@incr('startTimestamp'='1750844949',
'endTimestamp'='1750944949');
+SELECT * FROM paimon_table@incr('startTimestamp'='1750844949000',
'endTimestamp'='1750944949000');
```
参数说明:
| 参数 | 说明 | 示例 |
| --- | --- | -- |
-| `startSnapshotId` | 起始快照 ID,必须大于 0 | `'startSnapshotId'='3'` |
-| `endSnapshotId` | 结束快照 ID,必须大于 `startSnapshotId`。可选,如不指定,则表示从
`startSnapshotId` 开始读取到最新的快照 | `'endSnapshotId'='10'` |
+| `startSnapshotId` | 起始快照 ID,必须大于 0。必须和 `endSnapshotId` 配对使用。 |
`'startSnapshotId'='3'` |
+| `endSnapshotId` | 结束快照 ID,必须大于 `startSnapshotId`。必须和 `startSnapshotId` 配对使用。
| `'endSnapshotId'='10'` |
| `incrementalBetweenScanMode` | 指定增量读取的模式,默认 `auto`,支持 `delta`, `changelog` 和
`diff` | `'incrementalBetweenScanMode'='delta'` |
-| `startTimestamp` | 起始快照时间,必须大于等于 0 | `'startTimestamp'='1750844949'` |
-| `endTimestamp` | 结束快照时间,必须大于 `startTimestamp`。可选,如不指定,则表示从 `startTimestamp`
开始读取到最新的快照 | `'endTimestamp'='1750944949'` |
+| `startTimestamp` | 起始快照时间,必须大于等于 0。单位是毫秒。 |
`'startTimestamp'='1750844949000'` |
+| `endTimestamp` | 结束快照时间,必须大于 `startTimestamp`。可选,如不指定,则表示从 `startTimestamp`
开始读取到最新的快照。单位是毫秒。 | `'endTimestamp'='1750944949000'` |
> 注:
-
-> - `startSnapshotId` 和 `endSnapshotId` 会组成 Paimon 参数
`'incremental-between'='3,10'`
-
-> - `startTimestamp` 和 `endTimestamp` 会组成 Paimon 参数
`'incremental-between-timestamp'='1750844949,1750944949'`
-
-> - `incrementalBetweenScanMode` 对应 Paimon 参数 `incremental-between-scan-mode`。
+>
+> `startSnapshotId` 和 `endSnapshotId` 会组成 Paimon 参数
`'incremental-between'='3,10'`
+>
+> `startTimestamp` 和 `endTimestamp` 会组成 Paimon 参数
`'incremental-between-timestamp'='1750844949000,1750944949000'`
+>
+> `incrementalBetweenScanMode` 对应 Paimon 参数 `incremental-between-scan-mode`。
可参阅 [Paimon
文档](https://paimon.apache.org/docs/master/maintenance/configurations/)
进一步了解这些参数。
-
## 系统表
> 该功能自 3.1.0 版本支持
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/best-practices/doris-dlf-paimon.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/best-practices/doris-dlf-paimon.md
new file mode 100644
index 00000000000..ee2e9753643
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/best-practices/doris-dlf-paimon.md
@@ -0,0 +1,150 @@
+---
+{
+ "title": "Doris 集成阿里云 DLF",
+ "language": "zh-CN"
+}
+
+---
+
+阿里云数据湖构建 [Data Lake Formation,DLF](https://cn.aliyun.com/product/bigdata/dlf)
作为云原生数据湖架构核心组成部分,帮助用户快速地构建云原生数据湖架构。数据湖构建提供湖上元数据统一管理、企业级权限控制,并无缝对接多种计算引擎,打破数据孤岛,洞察业务价值。
+
+- 统一元数据与存储
+
+ 大数据计算引擎共享一套湖上元数据和存储,且数据可在环湖产品间流动。
+
+- 统一权限管理
+
+ 大数据计算引擎共享一套湖表权限配置,实现一次配置,多处生效。
+
+- 存储优化
+
+ 提供小文件合并、过期快照清理、分区整理及废弃文件清理等优化策略,提升存储效率。
+
+- 完善的云生态支持体系
+
+ 深度整合阿里云产品,包括流批计算引擎,实现开箱即用,提升用户体验与操作便捷性。
+
+DLF 2.5 版本开始支持 Paimon Rest Catalog。Doris 自 3.1.0 版本开始,支持集成 DLF 2.5+ 版本的 Paimon
Rest Catalog,可以无缝对接 DLF,访问并分析 Paimon 表数据。本文将演示如何使用 Apache Doris 对接 DLF 2.5+
版本并进行 Paimon 表数据访问。
+
+:::tip
+该功能从 Doris 3.1 开始支持
+:::
+
+## 使用指南
+
+### 01 开通 DLF 服务
+
+请参考 DLF 官方文档开通 DLF 服务,并创建相应的 Catalog、Database 和 Table。
+
+### 02 使用 EMR Spark SQL 访问 DLF
+
+- 连接
+
+ ```sql
+ spark-sql --master yarn \
+ --conf spark.driver.memory=5g \
+ --conf spark.sql.defaultCatalog=paimon \
+ --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
+ --conf spark.sql.catalog.paimon.metastore=rest \
+ --conf
spark.sql.extensions=org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions
\
+ --conf
spark.sql.catalog.paimon.uri=http://<region>-vpc.dlf.aliyuncs.com \
+ --conf spark.sql.catalog.paimon.warehouse=<your-catalog-name> \
+ --conf spark.sql.catalog.paimon.token.provider=dlf \
+ --conf spark.sql.catalog.paimon.dlf.token-loader=ecs
+ ```
+
+ > 替换对应的 `warehouse` 和 `uri` 地址。
+
+- 写入数据
+
+ ```sql
+ USE <your-catalog-name>;
+
+ CREATE TABLE users_samples
+ (
+ user_id INT,
+ age_level STRING,
+ final_gender_code STRING,
+ clk BOOLEAN
+ );
+
+ INSERT INTO users_samples VALUES
+ (1, '25-34', 'M', true),
+ (2, '18-24', 'F', false);
+
+ INSERT INTO users_samples VALUES
+ (3, '25-34', 'M', true),
+ (4, '18-24', 'F', false);
+
+ INSERT INTO users_samples VALUES
+ (5, '25-34', 'M', true),
+ (6, '18-24', 'F', false);
+ ```
+
+ 如遇到以下错误,请尝试移除 `/opt/apps/PAIMON/paimon-dlf-2.5/lib/spark3` 下的
`paimon-jindo-x.y.z.jar` 后重启 Spark 服务并重试。
+
+ ```
+ Ambiguous FileIO classes are:
+ org.apache.paimon.jindo.JindoLoader
+ org.apache.paimon.oss.OSSLoader
+ ```
+
+### 03 使用 Doris 链接 DLF
+
+- 创建 Paimon Catalog
+
+ ```sql
+ CREATE CATALOG paimon_dlf_test PROPERTIES (
+ 'type' = 'paimon',
+ 'paimon.catalog.type' = 'rest',
+ 'uri' = 'http://<region>-vpc.dlf.aliyuncs.com',
+ 'warehouse' = '<your-catalog-name>',
+ 'paimon.rest.token.provider' = 'dlf',
+ 'paimon.rest.dlf.access-key-id' = '<ak>',
+ 'paimon.rest.dlf.access-key-secret' = '<sk>'
+ );
+ ```
+
+ - Doris 会使用 DLF 返回的临时凭证访问 OSS 对象存储,不需要额外提供 OSS 的凭证信息。
+ - 仅支持在同 VPC 内访问 DLF,注意提供正确的 uri 地址。
+
+- 查询数据
+
+ ```sql
+ SELECT * FROM users_samples ORDER BY user_id;
+ +---------+-----------+-------------------+------+
+ | user_id | age_level | final_gender_code | clk |
+ +---------+-----------+-------------------+------+
+ | 1 | 25-34 | M | 1 |
+ | 2 | 18-24 | F | 0 |
+ | 3 | 25-34 | M | 1 |
+ | 4 | 18-24 | F | 0 |
+ | 5 | 25-34 | M | 1 |
+ | 6 | 18-24 | F | 0 |
+ +---------+-----------+-------------------+------+
+ ```
+
+- 查询系统表
+
+ ```sql
+ SELECT snapshot_id, commit_time, total_record_count FROM
users_samples$snapshots;
+ +-------------+-------------------------+--------------------+
+ | snapshot_id | commit_time | total_record_count |
+ +-------------+-------------------------+--------------------+
+ | 1 | 2025-08-09 05:56:02.906 | 2 |
+ | 2 | 2025-08-13 03:41:32.732 | 4 |
+ | 3 | 2025-08-13 03:41:35.218 | 6 |
+ +-------------+-------------------------+--------------------+
+ ```
+
+- 增量读取
+
+ ```sql
+ SELECT * FROM users_samples@incr('startSnapshotId'=1, 'endSnapshotId'=2)
ORDER BY user_id;
+ +---------+-----------+-------------------+------+
+ | user_id | age_level | final_gender_code | clk |
+ +---------+-----------+-------------------+------+
+ | 3 | 25-34 | M | 1 |
+ | 4 | 18-24 | F | 0 |
+ +---------+-----------+-------------------+------+
+ ```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/catalogs/paimon-catalog.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/catalogs/paimon-catalog.md
index cc6f194e6e3..07da5d94d99 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/catalogs/paimon-catalog.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/catalogs/paimon-catalog.md
@@ -246,9 +246,6 @@ SELECT * FROM paimon_ctl.paimon_db.paimon_tbl LIMIT 10;
支持查询指定的快照或时间戳区间内的增量数据。区间为左闭右开区间。
```sql
--- read from snapshot 2
-SELECT * FROM paimon_table@incr('startSnapshotId'='2');
-
-- between snapshots [0, 5)
SELECT * FROM paimon_table@incr('startSnapshotId'='0', 'endSnapshotId'='5');
@@ -256,33 +253,32 @@ SELECT * FROM paimon_table@incr('startSnapshotId'='0',
'endSnapshotId'='5');
SELECT * FROM paimon_table@incr('startSnapshotId'='0', 'endSnapshotId'='5',
'incrementalBetweenScanMode'='diff');
-- read from start timestamp
-SELECT * FROM paimon_table@incr('startTimestamp'='1750844949');
+SELECT * FROM paimon_table@incr('startTimestamp'='1750844949000');
-- read between timestamp
-SELECT * FROM paimon_table@incr('startTimestamp'='1750844949',
'endTimestamp'='1750944949');
+SELECT * FROM paimon_table@incr('startTimestamp'='1750844949000',
'endTimestamp'='1750944949000');
```
参数说明:
| 参数 | 说明 | 示例 |
| --- | --- | -- |
-| `startSnapshotId` | 起始快照 ID,必须大于 0 | `'startSnapshotId'='3'` |
-| `endSnapshotId` | 结束快照 ID,必须大于 `startSnapshotId`。可选,如不指定,则表示从
`startSnapshotId` 开始读取到最新的快照 | `'endSnapshotId'='10'` |
+| `startSnapshotId` | 起始快照 ID,必须大于 0。必须和 `endSnapshotId` 配对使用。 |
`'startSnapshotId'='3'` |
+| `endSnapshotId` | 结束快照 ID,必须大于 `startSnapshotId`。必须和 `startSnapshotId` 配对使用。
| `'endSnapshotId'='10'` |
| `incrementalBetweenScanMode` | 指定增量读取的模式,默认 `auto`,支持 `delta`, `changelog` 和
`diff` | `'incrementalBetweenScanMode'='delta'` |
-| `startTimestamp` | 起始快照时间,必须大于等于 0 | `'startTimestamp'='1750844949'` |
-| `endTimestamp` | 结束快照时间,必须大于 `startTimestamp`。可选,如不指定,则表示从 `startTimestamp`
开始读取到最新的快照 | `'endTimestamp'='1750944949'` |
+| `startTimestamp` | 起始快照时间,必须大于等于 0。单位是毫秒。 |
`'startTimestamp'='1750844949000'` |
+| `endTimestamp` | 结束快照时间,必须大于 `startTimestamp`。可选,如不指定,则表示从 `startTimestamp`
开始读取到最新的快照。单位是毫秒。 | `'endTimestamp'='1750944949000'` |
> 注:
-
-> - `startSnapshotId` 和 `endSnapshotId` 会组成 Paimon 参数
`'incremental-between'='3,10'`
-
-> - `startTimestamp` 和 `endTimestamp` 会组成 Paimon 参数
`'incremental-between-timestamp'='1750844949,1750944949'`
-
-> - `incrementalBetweenScanMode` 对应 Paimon 参数 `incremental-between-scan-mode`。
+>
+> `startSnapshotId` 和 `endSnapshotId` 会组成 Paimon 参数
`'incremental-between'='3,10'`
+>
+> `startTimestamp` 和 `endTimestamp` 会组成 Paimon 参数
`'incremental-between-timestamp'='1750844949000,1750944949000'`
+>
+> `incrementalBetweenScanMode` 对应 Paimon 参数 `incremental-between-scan-mode`。
可参阅 [Paimon
文档](https://paimon.apache.org/docs/master/maintenance/configurations/)
进一步了解这些参数。
-
## 系统表
> 该功能自 3.1.0 版本支持
diff --git a/sidebars.json b/sidebars.json
index 89899206c55..e6f299b467b 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -478,9 +478,10 @@
"lakehouse/best-practices/doris-iceberg",
"lakehouse/best-practices/doris-lakesoul",
"lakehouse/best-practices/doris-aws-s3tables",
+ "lakehouse/best-practices/doris-dlf-paimon",
+ "lakehouse/best-practices/doris-maxcompute",
"lakehouse/best-practices/tpch",
- "lakehouse/best-practices/tpcds",
- "lakehouse/best-practices/doris-maxcompute"
+ "lakehouse/best-practices/tpcds"
]
}
]
@@ -2435,4 +2436,4 @@
]
}
]
-}
\ No newline at end of file
+}
diff --git
a/versioned_docs/version-2.1/lakehouse/best-practices/doris-aws-s3tables.md
b/versioned_docs/version-2.1/lakehouse/best-practices/doris-aws-s3tables.md
index cb16b6f8bc3..1c349fe19b5 100644
--- a/versioned_docs/version-2.1/lakehouse/best-practices/doris-aws-s3tables.md
+++ b/versioned_docs/version-2.1/lakehouse/best-practices/doris-aws-s3tables.md
@@ -16,7 +16,7 @@ The release of S3 Tables further simplifies Lakehouse
architecture and brings mo
Thanks to Amazon S3 Tables' high compatibility with the Iceberg API, Apache
Doris can quickly integrate with S3 Tables. This article will demonstrate how
to connect Apache Doris with S3 Tables and perform data analysis and processing.
:::tip
-This feature is supported from Doris 3.1 onwards
+This feature is supported since Doris 3.1
:::
## Usage Guide
diff --git
a/versioned_docs/version-2.1/lakehouse/best-practices/doris-dlf-paimon.md
b/versioned_docs/version-2.1/lakehouse/best-practices/doris-dlf-paimon.md
new file mode 100644
index 00000000000..91fcf3f4623
--- /dev/null
+++ b/versioned_docs/version-2.1/lakehouse/best-practices/doris-dlf-paimon.md
@@ -0,0 +1,150 @@
+---
+{
+ "title": "Integration with Alibaba DLF",
+ "language": "en"
+}
+
+---
+
+Alibaba Cloud [Data Lake Formation
(DLF)](https://www.alibabacloud.com/en/product/datalake-formation) serves as a
core component of cloud-native data lake architecture, helping users quickly
build cloud-native data lake architectures. Data Lake Formation provides
unified metadata management on the lake, enterprise-level permission control,
and seamlessly integrates with multiple computing engines to break data silos
and uncover business value.
+
+- Unified Metadata and Storage
+
+ Computing engines share a unified set of lake metadata and storage,
enabling data flow between lake ecosystem products.
+
+- Unified Permission Management
+
+ Computing engines share a unified set of lake table permission
configurations, achieving one-time configuration with multi-location
effectiveness.
+
+- Storage Optimization
+
+ Provides optimization strategies including small file merging, expired
snapshot cleanup, partition organization, and obsolete file cleanup to improve
storage efficiency.
+
+- Comprehensive Cloud Ecosystem Support
+
+ Deep integration with Alibaba Cloud products, including streaming and
batch computing engines, enabling out-of-the-box functionality and enhancing
user experience and operational convenience.
+
+Starting from DLF version 2.5, Paimon Rest Catalog is supported. Doris,
beginning from version 3.1.0, supports integration with DLF 2.5+ Paimon Rest
Catalog, enabling seamless connection to DLF for accessing and analyzing Paimon
table data. This document demonstrates how to use Apache Doris to connect to
DLF 2.5+ and access Paimon table data.
+
+:::tip
+This feature is supported since Doris 3.1
+:::
+
+## Usage Guide
+
+### 01 Enable DLF Service
+
+Please refer to the DLF official documentation to enable the DLF service and
create corresponding Catalog, Database, and Table.
+
+### 02 Access DLF Using EMR Spark SQL
+
+- Connection
+
+ ```sql
+ spark-sql --master yarn \
+ --conf spark.driver.memory=5g \
+ --conf spark.sql.defaultCatalog=paimon \
+ --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
+ --conf spark.sql.catalog.paimon.metastore=rest \
+ --conf
spark.sql.extensions=org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions
\
+ --conf
spark.sql.catalog.paimon.uri=http://<region>-vpc.dlf.aliyuncs.com \
+ --conf spark.sql.catalog.paimon.warehouse=<your-catalog-name> \
+ --conf spark.sql.catalog.paimon.token.provider=dlf \
+ --conf spark.sql.catalog.paimon.dlf.token-loader=ecs
+ ```
+
+ > Replace the corresponding `warehouse` and `uri` address.
+
+- Write Data
+
+ ```sql
+ USE <your-catalog-name>;
+
+ CREATE TABLE users_samples
+ (
+ user_id INT,
+ age_level STRING,
+ final_gender_code STRING,
+ clk BOOLEAN
+ );
+
+ INSERT INTO users_samples VALUES
+ (1, '25-34', 'M', true),
+ (2, '18-24', 'F', false);
+
+ INSERT INTO users_samples VALUES
+ (3, '25-34', 'M', true),
+ (4, '18-24', 'F', false);
+
+ INSERT INTO users_samples VALUES
+ (5, '25-34', 'M', true),
+ (6, '18-24', 'F', false);
+ ```
+
+ If you encounter the following error, please try removing
`paimon-jindo-x.y.z.jar` from `/opt/apps/PAIMON/paimon-dlf-2.5/lib/spark3` and
restart the Spark service before retrying.
+
+ ```
+ Ambiguous FileIO classes are:
+ org.apache.paimon.jindo.JindoLoader
+ org.apache.paimon.oss.OSSLoader
+ ```
+
+### 03 Connect Doris to DLF
+
+- Create Paimon Catalog
+
+ ```sql
+ CREATE CATALOG paimon_dlf_test PROPERTIES (
+ 'type' = 'paimon',
+ 'paimon.catalog.type' = 'rest',
+ 'uri' = 'http://<region>-vpc.dlf.aliyuncs.com',
+ 'warehouse' = '<your-catalog-name>',
+ 'paimon.rest.token.provider' = 'dlf',
+ 'paimon.rest.dlf.access-key-id' = '<ak>',
+ 'paimon.rest.dlf.access-key-secret' = '<sk>'
+ );
+ ```
+
+ - Doris will use temporary credentials returned by DLF to access OSS
object storage, without requiring additional OSS credential information.
+ - Only supports accessing DLF within the same VPC, ensure you provide the
correct uri address.
+
+- Query Data
+
+ ```sql
+ SELECT * FROM users_samples ORDER BY user_id;
+ +---------+-----------+-------------------+------+
+ | user_id | age_level | final_gender_code | clk |
+ +---------+-----------+-------------------+------+
+ | 1 | 25-34 | M | 1 |
+ | 2 | 18-24 | F | 0 |
+ | 3 | 25-34 | M | 1 |
+ | 4 | 18-24 | F | 0 |
+ | 5 | 25-34 | M | 1 |
+ | 6 | 18-24 | F | 0 |
+ +---------+-----------+-------------------+------+
+ ```
+
+- Query System Tables
+
+ ```sql
+ SELECT snapshot_id, commit_time, total_record_count FROM
users_samples$snapshots;
+ +-------------+-------------------------+--------------------+
+ | snapshot_id | commit_time | total_record_count |
+ +-------------+-------------------------+--------------------+
+ | 1 | 2025-08-09 05:56:02.906 | 2 |
+ | 2 | 2025-08-13 03:41:32.732 | 4 |
+ | 3 | 2025-08-13 03:41:35.218 | 6 |
+ +-------------+-------------------------+--------------------+
+ ```
+
+- Batch Incremental Reading
+
+ ```sql
+ SELECT * FROM users_samples@incr('startSnapshotId'=1, 'endSnapshotId'=2)
ORDER BY user_id;
+ +---------+-----------+-------------------+------+
+ | user_id | age_level | final_gender_code | clk |
+ +---------+-----------+-------------------+------+
+ | 3 | 25-34 | M | 1 |
+ | 4 | 18-24 | F | 0 |
+ +---------+-----------+-------------------+------+
+ ```
diff --git a/versioned_docs/version-2.1/lakehouse/catalogs/paimon-catalog.md
b/versioned_docs/version-2.1/lakehouse/catalogs/paimon-catalog.md
index ce9253d81f7..f2df00afdb1 100644
--- a/versioned_docs/version-2.1/lakehouse/catalogs/paimon-catalog.md
+++ b/versioned_docs/version-2.1/lakehouse/catalogs/paimon-catalog.md
@@ -246,9 +246,6 @@ Supports [Batch
Incremental](https://paimon.apache.org/docs/master/flink/sql-que
Supports querying incremental data within specified snapshot or timestamp
intervals. The interval is left-closed and right-open.
```sql
--- read from snapshot 2
-SELECT * FROM paimon_table@incr('startSnapshotId'='2');
-
-- between snapshots [0, 5)
SELECT * FROM paimon_table@incr('startSnapshotId'='0', 'endSnapshotId'='5');
@@ -256,29 +253,29 @@ SELECT * FROM paimon_table@incr('startSnapshotId'='0',
'endSnapshotId'='5');
SELECT * FROM paimon_table@incr('startSnapshotId'='0', 'endSnapshotId'='5',
'incrementalBetweenScanMode'='diff');
-- read from start timestamp
-SELECT * FROM paimon_table@incr('startTimestamp'='1750844949');
+SELECT * FROM paimon_table@incr('startTimestamp'='1750844949000');
-- read between timestamp
-SELECT * FROM paimon_table@incr('startTimestamp'='1750844949',
'endTimestamp'='1750944949');
+SELECT * FROM paimon_table@incr('startTimestamp'='1750844949000',
'endTimestamp'='1750944949000');
```
Parameter:
| Parameter | Description | Example |
| --- | --- | -- |
-| `startSnapshotId` | Starting snapshot ID, must be greater than 0 |
`'startSnapshotId'='3'` |
-| `endSnapshotId` | Ending snapshot ID, must be greater than
`startSnapshotId`. Optional, if not specified, reads from `startSnapshotId` to
the latest snapshot | `'endSnapshotId'='10'` |
+| `startSnapshotId` | Starting snapshot ID, must be greater than 0. Must be
specified with `endSnapshotId` together. | `'startSnapshotId'='3'` |
+| `endSnapshotId` | Ending snapshot ID, must be greater than
`startSnapshotId`. Must be specified with `startSnapshotId` together. |
`'endSnapshotId'='10'` |
| `incrementalBetweenScanMode` | Specifies the incremental read mode, default
is `auto`, supports `delta`, `changelog` and `diff` |
`'incrementalBetweenScanMode'='delta'` |
-| `startTimestamp` | Starting snapshot timestamp, must be greater than or
equal to 0 | `'startTimestamp'='1750844949'` |
-| `endTimestamp` | Ending snapshot timestamp, must be greater than
`startTimestamp`. Optional, if not specified, reads from `startTimestamp` to
the latest snapshot | `'endTimestamp'='1750944949'` |
+| `startTimestamp` | Starting snapshot timestamp, must be greater than or
equal to 0. Unit is millisecond. | `'startTimestamp'='1750844949000'` |
+| `endTimestamp` | Ending snapshot timestamp, must be greater than
`startTimestamp`. Optional, if not specified, reads from `startTimestamp` to
the latest snapshot. Unit is millisecond. | `'endTimestamp'='1750944949000'` |
> Notice:
-
-> - `startSnapshotId` and `endSnapshotId` will compose the Paimon parameter
`'incremental-between'='3,10'`
-
-> - `startTimestamp` and `endTimestamp` will compose the Paimon parameter
`'incremental-between-timestamp'='1750844949,1750944949'`
-
-> - `incrementalBetweenScanMode` corresponds to the Paimon parameter
`incremental-between-scan-mode`.
+>
+> `startSnapshotId` and `endSnapshotId` will compose the Paimon parameter
`'incremental-between'='3,10'`
+>
+> `startTimestamp` and `endTimestamp` will compose the Paimon parameter
`'incremental-between-timestamp'='1750844949000,1750944949000'`
+>
+> `incrementalBetweenScanMode` corresponds to the Paimon parameter
`incremental-between-scan-mode`.
Refer to the [Paimon
documentation](https://paimon.apache.org/docs/master/maintenance/configurations/)
for further details about these parameters.
diff --git
a/versioned_docs/version-3.0/lakehouse/best-practices/doris-aws-s3tables.md
b/versioned_docs/version-3.0/lakehouse/best-practices/doris-aws-s3tables.md
index cb16b6f8bc3..1c349fe19b5 100644
--- a/versioned_docs/version-3.0/lakehouse/best-practices/doris-aws-s3tables.md
+++ b/versioned_docs/version-3.0/lakehouse/best-practices/doris-aws-s3tables.md
@@ -16,7 +16,7 @@ The release of S3 Tables further simplifies Lakehouse
architecture and brings mo
Thanks to Amazon S3 Tables' high compatibility with the Iceberg API, Apache
Doris can quickly integrate with S3 Tables. This article will demonstrate how
to connect Apache Doris with S3 Tables and perform data analysis and processing.
:::tip
-This feature is supported from Doris 3.1 onwards
+This feature is supported since Doris 3.1
:::
## Usage Guide
diff --git
a/versioned_docs/version-3.0/lakehouse/best-practices/doris-dlf-paimon.md
b/versioned_docs/version-3.0/lakehouse/best-practices/doris-dlf-paimon.md
new file mode 100644
index 00000000000..91fcf3f4623
--- /dev/null
+++ b/versioned_docs/version-3.0/lakehouse/best-practices/doris-dlf-paimon.md
@@ -0,0 +1,150 @@
+---
+{
+ "title": "Integration with Alibaba DLF",
+ "language": "en"
+}
+
+---
+
+Alibaba Cloud [Data Lake Formation
(DLF)](https://www.alibabacloud.com/en/product/datalake-formation) serves as a
core component of cloud-native data lake architecture, helping users quickly
build cloud-native data lake architectures. Data Lake Formation provides
unified metadata management on the lake, enterprise-level permission control,
and seamlessly integrates with multiple computing engines to break data silos
and uncover business value.
+
+- Unified Metadata and Storage
+
+ Computing engines share a unified set of lake metadata and storage,
enabling data flow between lake ecosystem products.
+
+- Unified Permission Management
+
+ Computing engines share a unified set of lake table permission
configurations, achieving one-time configuration with multi-location
effectiveness.
+
+- Storage Optimization
+
+ Provides optimization strategies including small file merging, expired
snapshot cleanup, partition organization, and obsolete file cleanup to improve
storage efficiency.
+
+- Comprehensive Cloud Ecosystem Support
+
+ Deep integration with Alibaba Cloud products, including streaming and
batch computing engines, enabling out-of-the-box functionality and enhancing
user experience and operational convenience.
+
+Starting from DLF version 2.5, Paimon Rest Catalog is supported. Doris,
beginning from version 3.1.0, supports integration with DLF 2.5+ Paimon Rest
Catalog, enabling seamless connection to DLF for accessing and analyzing Paimon
table data. This document demonstrates how to use Apache Doris to connect to
DLF 2.5+ and access Paimon table data.
+
+:::tip
+This feature is supported since Doris 3.1
+:::
+
+## Usage Guide
+
+### 01 Enable DLF Service
+
+Please refer to the DLF official documentation to enable the DLF service and
create corresponding Catalog, Database, and Table.
+
+### 02 Access DLF Using EMR Spark SQL
+
+- Connection
+
+ ```sql
+ spark-sql --master yarn \
+ --conf spark.driver.memory=5g \
+ --conf spark.sql.defaultCatalog=paimon \
+ --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
+ --conf spark.sql.catalog.paimon.metastore=rest \
+ --conf
spark.sql.extensions=org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions
\
+ --conf
spark.sql.catalog.paimon.uri=http://<region>-vpc.dlf.aliyuncs.com \
+ --conf spark.sql.catalog.paimon.warehouse=<your-catalog-name> \
+ --conf spark.sql.catalog.paimon.token.provider=dlf \
+ --conf spark.sql.catalog.paimon.dlf.token-loader=ecs
+ ```
+
+ > Replace the corresponding `warehouse` and `uri` address.
+
+- Write Data
+
+ ```sql
+ USE <your-catalog-name>;
+
+ CREATE TABLE users_samples
+ (
+ user_id INT,
+ age_level STRING,
+ final_gender_code STRING,
+ clk BOOLEAN
+ );
+
+ INSERT INTO users_samples VALUES
+ (1, '25-34', 'M', true),
+ (2, '18-24', 'F', false);
+
+ INSERT INTO users_samples VALUES
+ (3, '25-34', 'M', true),
+ (4, '18-24', 'F', false);
+
+ INSERT INTO users_samples VALUES
+ (5, '25-34', 'M', true),
+ (6, '18-24', 'F', false);
+ ```
+
+ If you encounter the following error, please try removing
`paimon-jindo-x.y.z.jar` from `/opt/apps/PAIMON/paimon-dlf-2.5/lib/spark3` and
restart the Spark service before retrying.
+
+ ```
+ Ambiguous FileIO classes are:
+ org.apache.paimon.jindo.JindoLoader
+ org.apache.paimon.oss.OSSLoader
+ ```
+
+### 03 Connect Doris to DLF
+
+- Create Paimon Catalog
+
+ ```sql
+ CREATE CATALOG paimon_dlf_test PROPERTIES (
+ 'type' = 'paimon',
+ 'paimon.catalog.type' = 'rest',
+ 'uri' = 'http://<region>-vpc.dlf.aliyuncs.com',
+ 'warehouse' = '<your-catalog-name>',
+ 'paimon.rest.token.provider' = 'dlf',
+ 'paimon.rest.dlf.access-key-id' = '<ak>',
+ 'paimon.rest.dlf.access-key-secret' = '<sk>'
+ );
+ ```
+
+ - Doris will use temporary credentials returned by DLF to access OSS
object storage, without requiring additional OSS credential information.
+ - Only supports accessing DLF within the same VPC, ensure you provide the
correct uri address.
+
+- Query Data
+
+ ```sql
+ SELECT * FROM users_samples ORDER BY user_id;
+ +---------+-----------+-------------------+------+
+ | user_id | age_level | final_gender_code | clk |
+ +---------+-----------+-------------------+------+
+ | 1 | 25-34 | M | 1 |
+ | 2 | 18-24 | F | 0 |
+ | 3 | 25-34 | M | 1 |
+ | 4 | 18-24 | F | 0 |
+ | 5 | 25-34 | M | 1 |
+ | 6 | 18-24 | F | 0 |
+ +---------+-----------+-------------------+------+
+ ```
+
+- Query System Tables
+
+ ```sql
+ SELECT snapshot_id, commit_time, total_record_count FROM
users_samples$snapshots;
+ +-------------+-------------------------+--------------------+
+ | snapshot_id | commit_time | total_record_count |
+ +-------------+-------------------------+--------------------+
+ | 1 | 2025-08-09 05:56:02.906 | 2 |
+ | 2 | 2025-08-13 03:41:32.732 | 4 |
+ | 3 | 2025-08-13 03:41:35.218 | 6 |
+ +-------------+-------------------------+--------------------+
+ ```
+
+- Batch Incremental Reading
+
+ ```sql
+ SELECT * FROM users_samples@incr('startSnapshotId'=1, 'endSnapshotId'=2)
ORDER BY user_id;
+ +---------+-----------+-------------------+------+
+ | user_id | age_level | final_gender_code | clk |
+ +---------+-----------+-------------------+------+
+ | 3 | 25-34 | M | 1 |
+ | 4 | 18-24 | F | 0 |
+ +---------+-----------+-------------------+------+
+ ```
diff --git a/versioned_docs/version-3.0/lakehouse/catalogs/paimon-catalog.md
b/versioned_docs/version-3.0/lakehouse/catalogs/paimon-catalog.md
index ce9253d81f7..f2df00afdb1 100644
--- a/versioned_docs/version-3.0/lakehouse/catalogs/paimon-catalog.md
+++ b/versioned_docs/version-3.0/lakehouse/catalogs/paimon-catalog.md
@@ -246,9 +246,6 @@ Supports [Batch
Incremental](https://paimon.apache.org/docs/master/flink/sql-que
Supports querying incremental data within specified snapshot or timestamp
intervals. The interval is left-closed and right-open.
```sql
--- read from snapshot 2
-SELECT * FROM paimon_table@incr('startSnapshotId'='2');
-
-- between snapshots [0, 5)
SELECT * FROM paimon_table@incr('startSnapshotId'='0', 'endSnapshotId'='5');
@@ -256,29 +253,29 @@ SELECT * FROM paimon_table@incr('startSnapshotId'='0',
'endSnapshotId'='5');
SELECT * FROM paimon_table@incr('startSnapshotId'='0', 'endSnapshotId'='5',
'incrementalBetweenScanMode'='diff');
-- read from start timestamp
-SELECT * FROM paimon_table@incr('startTimestamp'='1750844949');
+SELECT * FROM paimon_table@incr('startTimestamp'='1750844949000');
-- read between timestamp
-SELECT * FROM paimon_table@incr('startTimestamp'='1750844949',
'endTimestamp'='1750944949');
+SELECT * FROM paimon_table@incr('startTimestamp'='1750844949000',
'endTimestamp'='1750944949000');
```
Parameter:
| Parameter | Description | Example |
| --- | --- | -- |
-| `startSnapshotId` | Starting snapshot ID, must be greater than 0 |
`'startSnapshotId'='3'` |
-| `endSnapshotId` | Ending snapshot ID, must be greater than
`startSnapshotId`. Optional, if not specified, reads from `startSnapshotId` to
the latest snapshot | `'endSnapshotId'='10'` |
+| `startSnapshotId` | Starting snapshot ID, must be greater than 0. Must be
specified with `endSnapshotId` together. | `'startSnapshotId'='3'` |
+| `endSnapshotId` | Ending snapshot ID, must be greater than
`startSnapshotId`. Must be specified with `startSnapshotId` together. |
`'endSnapshotId'='10'` |
| `incrementalBetweenScanMode` | Specifies the incremental read mode, default
is `auto`, supports `delta`, `changelog` and `diff` |
`'incrementalBetweenScanMode'='delta'` |
-| `startTimestamp` | Starting snapshot timestamp, must be greater than or
equal to 0 | `'startTimestamp'='1750844949'` |
-| `endTimestamp` | Ending snapshot timestamp, must be greater than
`startTimestamp`. Optional, if not specified, reads from `startTimestamp` to
the latest snapshot | `'endTimestamp'='1750944949'` |
+| `startTimestamp` | Starting snapshot timestamp, must be greater than or
equal to 0. Unit is millisecond. | `'startTimestamp'='1750844949000'` |
+| `endTimestamp` | Ending snapshot timestamp, must be greater than
`startTimestamp`. Optional, if not specified, reads from `startTimestamp` to
the latest snapshot. Unit is millisecond. | `'endTimestamp'='1750944949000'` |
> Notice:
-
-> - `startSnapshotId` and `endSnapshotId` will compose the Paimon parameter
`'incremental-between'='3,10'`
-
-> - `startTimestamp` and `endTimestamp` will compose the Paimon parameter
`'incremental-between-timestamp'='1750844949,1750944949'`
-
-> - `incrementalBetweenScanMode` corresponds to the Paimon parameter
`incremental-between-scan-mode`.
+>
+> `startSnapshotId` and `endSnapshotId` will compose the Paimon parameter
`'incremental-between'='3,10'`
+>
+> `startTimestamp` and `endTimestamp` will compose the Paimon parameter
`'incremental-between-timestamp'='1750844949000,1750944949000'`
+>
+> `incrementalBetweenScanMode` corresponds to the Paimon parameter
`incremental-between-scan-mode`.
Refer to the [Paimon
documentation](https://paimon.apache.org/docs/master/maintenance/configurations/)
for further details about these parameters.
diff --git a/versioned_sidebars/version-2.1-sidebars.json
b/versioned_sidebars/version-2.1-sidebars.json
index 427cf13b1ea..990f0ab4c8e 100644
--- a/versioned_sidebars/version-2.1-sidebars.json
+++ b/versioned_sidebars/version-2.1-sidebars.json
@@ -444,9 +444,10 @@
"lakehouse/best-practices/doris-iceberg",
"lakehouse/best-practices/doris-lakesoul",
"lakehouse/best-practices/doris-aws-s3tables",
+ "lakehouse/best-practices/doris-dlf-paimon",
+ "lakehouse/best-practices/doris-maxcompute",
"lakehouse/best-practices/tpch",
- "lakehouse/best-practices/tpcds",
- "lakehouse/best-practices/doris-maxcompute"
+ "lakehouse/best-practices/tpcds"
]
}
]
diff --git a/versioned_sidebars/version-3.0-sidebars.json
b/versioned_sidebars/version-3.0-sidebars.json
index e35fb7eab40..1c4a416f41d 100644
--- a/versioned_sidebars/version-3.0-sidebars.json
+++ b/versioned_sidebars/version-3.0-sidebars.json
@@ -469,9 +469,10 @@
"lakehouse/best-practices/doris-iceberg",
"lakehouse/best-practices/doris-lakesoul",
"lakehouse/best-practices/doris-aws-s3tables",
+ "lakehouse/best-practices/doris-dlf-paimon",
+ "lakehouse/best-practices/doris-maxcompute",
"lakehouse/best-practices/tpch",
- "lakehouse/best-practices/tpcds",
- "lakehouse/best-practices/doris-maxcompute"
+ "lakehouse/best-practices/tpcds"
]
}
]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]