This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 718a6947a4a [opt](storage) add some storage faq and modify format 
(#2308)
718a6947a4a is described below

commit 718a6947a4a474f9fa72cdf5f4f66913dd0b7569
Author: Mingyu Chen (Rayner) <morning...@163.com>
AuthorDate: Mon Apr 21 19:24:13 2025 -0700

    [opt](storage) add some storage faq and modify format (#2308)
    
    ## Versions
    
    - [x] dev
    - [ ] 3.0
    - [ ] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [x] Checked by AI
    - [ ] Test Cases Built
---
 docs/lakehouse/storages/aliyun-oss.md              |  3 +-
 docs/lakehouse/storages/hdfs.md                    | 49 ++++++++++++++++-
 docs/lakehouse/storages/huawei-obs.md              |  5 +-
 docs/lakehouse/storages/s3.md                      | 61 +++++++++++++++++++++-
 docs/lakehouse/storages/tencent-cos.md             |  2 -
 .../current/lakehouse/storages/aliyun-oss.md       |  3 +-
 .../current/lakehouse/storages/hdfs.md             | 48 ++++++++++++++++-
 .../current/lakehouse/storages/huawei-obs.md       |  5 +-
 .../current/lakehouse/storages/s3.md               |  3 ++
 .../current/lakehouse/storages/tencent-cos.md      |  2 -
 10 files changed, 164 insertions(+), 17 deletions(-)

diff --git a/docs/lakehouse/storages/aliyun-oss.md 
b/docs/lakehouse/storages/aliyun-oss.md
index feb76997898..46d779ad47f 100644
--- a/docs/lakehouse/storages/aliyun-oss.md
+++ b/docs/lakehouse/storages/aliyun-oss.md
@@ -24,8 +24,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Aliyun OSS Access Parameters
-
 This document introduces the parameters required to access Aliyun OSS, 
applicable to the following scenarios:
 
 - Catalog properties
@@ -37,6 +35,7 @@ This document introduces the parameters required to access 
Aliyun OSS, applicabl
 **Doris uses the S3 Client to access Aliyun OSS through the S3 compatible 
protocol.**
 
 ## Parameter Overview
+
 | Property Name                     | Former Name      | Description           
                                           | Default | Required |
 
|-----------------------------------|------------------|------------------------------------------------------------------|---------|----------|
 | `s3.endpoint`                     | `oss.endpoint`   | OSS endpoint, 
specifies the access endpoint for Aliyun OSS. Note that the endpoints for OSS 
and OSS HDFS are different. |         | Yes      |
diff --git a/docs/lakehouse/storages/hdfs.md b/docs/lakehouse/storages/hdfs.md
index 0364e58b391..0dde8ecf4ce 100644
--- a/docs/lakehouse/storages/hdfs.md
+++ b/docs/lakehouse/storages/hdfs.md
@@ -23,7 +23,7 @@ KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
-# HDFS 
+
 This document is used to introduce the parameters required when accessing 
HDFS. These parameters apply to:
 - Catalog properties.
 - Table Valued Function properties.
@@ -33,6 +33,7 @@ This document is used to introduce the parameters required 
when accessing HDFS.
 - Backup and restore
 
 ## Parameter Overview
+
 | Property Name                            | Former Name                      
| Description                                                                   
                                                                                
                                                                                
     | Default Value | Required |
 
|------------------------------------------|----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------|
 | `hdfs.authentication.type`               | `hadoop.security.authentication` 
| Authentication type for accessing HDFS. Supports `kerberos` and `simple`      
                                                                                
                                                                                
     | `simple`      | No       |
@@ -109,3 +110,49 @@ The configuration file directory must include 
`hdfs-site.xml` and `core-site.xml
 
 If the configuration file contains the parameters mentioned in the document 
above, the parameters explicitly configured by the user will be used 
preferentially. The configuration file can specify multiple files, separated by 
commas. For example, `hadoop/conf/core-site.xml,hadoop/conf/hdfs-site.xml`.
 
+## IO Optimization
+
+### Hedged Read
+
+In some cases, high HDFS load may cause reading data replicas from HDFS to 
take longer, slowing down overall query efficiency. HDFS Client provides Hedged 
Read functionality.
+This feature can start another read thread to read the same data when a read 
request exceeds a certain threshold, using whichever result returns first.
+
+Note: This feature may increase HDFS cluster load, use with caution.
+
+You can enable this feature as follows:
+
+```sql
+create catalog regression properties (
+  'type'='hms',
+  'hive.metastore.uris' = 'thrift://172.21.16.47:7004',
+  'dfs.client.hedged.read.threadpool.size' = '128',
+  'dfs.client.hedged.read.threshold.millis' = "500"
+);
+```
+
+`dfs.client.hedged.read.threadpool.size` indicates the number of threads for 
Hedged Read, shared by one HDFS Client. Typically, BE nodes share one HDFS 
Client for one HDFS cluster.
+
+`dfs.client.hedged.read.threshold.millis` is the read threshold in 
milliseconds. Hedged Read is triggered when a read request exceeds this 
threshold.
+
+After enabling, you can see related parameters in Query Profile:
+
+`TotalHedgedRead`: Number of Hedged Read initiations.
+
+`HedgedReadWins`: Number of successful Hedged Reads (initiated and returned 
faster than original request)
+
+Note that these values are cumulative for a single HDFS Client, not per query. 
The same HDFS Client is reused by multiple queries.
+
+### `dfs.client.socket-timeout`
+
+`dfs.client.socket-timeout` is a client configuration parameter in Hadoop HDFS 
that sets the socket timeout for client connections with DataNode or NameNode 
when establishing connections or reading data, measured in milliseconds. The 
default value is typically 60,000 milliseconds.
+
+Reducing this parameter's value allows clients to timeout faster and retry or 
switch to other nodes when encountering network delays, slow DataNode 
responses, or connection issues. This helps reduce waiting time and improve 
system response time. For example, in some tests, setting 
`dfs.client.socket-timeout` to a smaller value (like 5000 milliseconds) can 
quickly detect DataNode delays or failures, avoiding long waits.
+
+Note:
+
+- Setting timeout too low may cause frequent timeout errors during network 
fluctuations or high node load, affecting task stability.
+- It's recommended to adjust this parameter reasonably based on actual network 
environment and system load to balance response time and system stability.
+- This parameter should be set in client configuration files (like 
`hdfs-site.xml`) to ensure clients use correct timeout values when 
communicating with HDFS.
+
+In summary, properly configuring `dfs.client.socket-timeout` can improve I/O 
response time while ensuring system stability and reliability.
+
diff --git a/docs/lakehouse/storages/huawei-obs.md 
b/docs/lakehouse/storages/huawei-obs.md
index a6dfbfcb3af..2d1d6f5f714 100644
--- a/docs/lakehouse/storages/huawei-obs.md
+++ b/docs/lakehouse/storages/huawei-obs.md
@@ -24,8 +24,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## Huawei Cloud OBS Access Parameters
-
 This document introduces the parameters required to access Huawei Cloud OBS, 
applicable to the following scenarios:
 
 - Catalog properties
@@ -35,7 +33,8 @@ This document introduces the parameters required to access 
Huawei Cloud OBS, app
 - Outfile properties
 
 **Doris uses the S3 Client to access Huawei Cloud OBS through the S3 
compatible protocol.**
-### Parameter Overview
+
+## Parameter Overview
 
 | Property Name                     | Former Name      | Description           
                     | Default | Required |
 
|-----------------------------------|------------------|--------------------------------------------|---------|----------|
diff --git a/docs/lakehouse/storages/s3.md b/docs/lakehouse/storages/s3.md
index bcc4ff86633..f8740a2d736 100644
--- a/docs/lakehouse/storages/s3.md
+++ b/docs/lakehouse/storages/s3.md
@@ -24,5 +24,64 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-The document is under development, please refer to versioned doc 2.1 or 3.0
+---
+{
+    "title": "S3",
+    "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This document describes the parameters required for accessing AWS S3. These 
parameters apply to:
+
+- Catalog properties
+- Table Valued Function properties
+- Broker Load properties
+- Export properties
+- Outfile properties
+
+## Parameter Overview
+
+| Property Name                 | Former Name    | Description                 
                   | Default Value | Required |
+|------------------------------|----------------|------------------------------------------------|--------------|----------|
+| `s3.endpoint`                 | S3 endpoint    | S3 endpoint                 
                   |              | Yes      |
+| `s3.region`                   | S3 region      | S3 region                   
                   |              | No       |
+| `s3.access_key`               | S3 access key  | S3 access key               
                   |              | Yes      |
+| `s3.secret_key`               | S3 secret key  | S3 secret key               
                   |              | Yes      |
+| `s3.use_path_style`           | use_path_style | Whether to use path-style 
access to S3. Used when accessing certain S3-compatible object storage that 
doesn't support host-style | `false`      | No       |
+| `s3.connection.maximum`       |                | Maximum S3 connections      
                   | `50`         | No       |
+| `s3.connection.request.timeout` |                | S3 request timeout in 
milliseconds            | `3000`       | No       |
+| `s3.connection.timeout`       |                | S3 connection timeout in 
milliseconds         | `1000`       | No       |
+
+### Authentication Configuration
+
+When accessing AWS S3, you need to provide AWS Access Key and AWS Secret Key, 
which are the following parameters:
+- s3.access_key
+- s3.secret_key
+
+### Example Configuration
 
+```properties
+"s3.access_key" = "ak"
+"s3.secret_key" = "sk"
+"s3.endpoint" = "s3.us-east-1.amazonaws.com"
+"s3.region" = "us-east-1"
+```
diff --git a/docs/lakehouse/storages/tencent-cos.md 
b/docs/lakehouse/storages/tencent-cos.md
index dfab7aba931..7edaf295cc1 100644
--- a/docs/lakehouse/storages/tencent-cos.md
+++ b/docs/lakehouse/storages/tencent-cos.md
@@ -24,8 +24,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## Tencent Cloud COS Access Parameters
-
 This document introduces the parameters required to access Tencent Cloud COS, 
applicable to the following scenarios:
 
 - Catalog properties
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/aliyun-oss.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/aliyun-oss.md
index 9da4fa5621e..eb2df1191aa 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/aliyun-oss.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/aliyun-oss.md
@@ -24,8 +24,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# 阿里云 OSS 访问参数
-
 本文档介绍访问阿里云 OSS 所需的参数,这些参数适用于以下场景:
 
 - Catalog 属性
@@ -37,6 +35,7 @@ under the License.
 **Doris 使用 S3 Client,通过 S3 兼容协议访问阿里云 OSS。**
 
 ## 参数总览
+
 | 属性名称                            | 曾用名              | 描述                      
                                       | 默认值    | 是否必须 |
 
|---------------------------------|------------------|----------------------------------------------------------------|--------|------|
 | `s3.endpoint`                   | `oss.endpoint`   | OSS endpoint,指定阿里云 OSS 
的访问端点。注意,OSS 和 OSS HDFS 的 endpoint 不相同。 |        | 是    |
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/hdfs.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/hdfs.md
index f7e504013af..29cbd165126 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/hdfs.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/hdfs.md
@@ -23,7 +23,7 @@ KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
-# HDFS 
+
 本文档用于介绍访问 HDFS 时所需的参数。这些参数适用于:
 - Catalog 属性。
 - Table Valued Function 属性。
@@ -33,6 +33,7 @@ under the License.
 - 备份恢复
 
 ## 参数总览
+
 | 属性名称                                     | 曾用名                              
| 描述                                                                            
                                                                                
                                                                            | 
默认值      | 是否必须 |
 
|------------------------------------------|----------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|------|
 | `hdfs.authentication.type`               | `hadoop.security.authentication` 
| 访问 HDFS 的认证类型。支持 `kerberos` 和 `simple`                                        
                                                                                
                                                                              | 
`simple` | 否    |
@@ -108,4 +109,49 @@ Doris 支持通过 `hadoop.config.resources` 参数来指定 HDFS 相关配置
 
 如果配置文件包含文档上述参数,则优先使用用户显示配置的参数。配置文件可以指定多个文件,多个文件以逗号分隔。如 
`hadoop/conf/core-site.xml,hadoop/conf/hdfs-site.xml`。
 
+## IO 优化
+
+### Hedged Read
+
+在某些情况下,HDFS 的负载较高可能导致读取某个 HDFS 上的数据副本的时间较长,从而拖慢整体的查询效率。HDFS Client 提供了 Hedged 
Read 功能。
+该功能可以在一个读请求超过一定阈值未返回时,启动另一个读线程读取同一份数据,哪个先返回就是用哪个结果。
+
+注意:该功能可能会增加 HDFS 集群的负载,请酌情使用。
+
+可以通过以下方式开启这个功能:
+
+```
+create catalog regression properties (
+    'type'='hms',
+    'hive.metastore.uris' = 'thrift://172.21.16.47:7004',
+    'dfs.client.hedged.read.threadpool.size' = '128',
+    'dfs.client.hedged.read.threshold.millis' = "500"
+);
+```
+
+`dfs.client.hedged.read.threadpool.size` 表示用于 Hedged Read 的线程数,这些线程由一个 HDFS 
Client 共享。通常情况下,针对一个 HDFS 集群,BE 节点会共享一个 HDFS Client。
+
+`dfs.client.hedged.read.threshold.millis` 是读取阈值,单位毫秒。当一个读请求超过这个阈值未返回时,会触发 
Hedged Read。
+
+开启后,可以在 Query Profile 中看到相关参数:
+
+`TotalHedgedRead`: 发起 Hedged Read 的次数。
+
+`HedgedReadWins`:Hedged Read 成功的次数(发起并且比原请求更快返回的次数)
+
+注意,这里的值是单个 HDFS Client 的累计值,而不是单个查询的数值。同一个 HDFS Client 会被多个查询复用。
+
+### `dfs.client.socket-timeout`
+
+`dfs.client.socket-timeout` 是 Hadoop HDFS 中的一个客户端配置参数,用于设置客户端与 DataNode 或 
NameNode 之间建立连接或读取数据时的套接字(socket)超时时间,单位为毫秒。该参数的默认值通常为 60,000 毫秒。
+
+将该参数的值调小,可以使客户端在遇到网络延迟、DataNode 
响应慢或连接异常等问题时,更快地超时并进行重试或切换到其他节点。这有助于减少等待时间,提高系统的响应速度。例如,在某些测试中,将 
`dfs.client.socket-timeout` 设置为较小的值(如 5000 毫秒),可以迅速检测到 DataNode 
的延迟或故障,从而避免长时间的等待。
+
+注意:
+
+- 将超时时间设置得过小可能导致在网络波动或节点负载较高时频繁出现超时错误,影响任务的稳定性。
+- 建议根据实际网络环境和系统负载情况,合理调整该参数的值,以在响应速度和系统稳定性之间取得平衡。
+- 该参数应在客户端配置文件(如 `hdfs-site.xml`)中设置,确保客户端在与 HDFS 通信时使用正确的超时时间。
+
+总之,合理配置 `dfs.client.socket-timeout` 参数,可以在提高 I/O 响应速度的同时,确保系统的稳定性和可靠性。
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/huawei-obs.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/huawei-obs.md
index b75c0dea213..b9bbb20a4e9 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/huawei-obs.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/huawei-obs.md
@@ -24,8 +24,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## 华为云 OBS 访问参数
-
 本文档介绍访问华为云 OBS 所需的参数,这些参数适用于以下场景:
 
 - Catalog 属性
@@ -35,7 +33,8 @@ under the License.
 - Outfile 属性
 
 **Doris 使用 S3 Client,通过 S3 兼容协议访问华为云 OBS。**
-### 参数总览
+
+## 参数总览
 
 | 属性名称                            | 曾用名              | 描述                      
              | 默认值    | 是否必须 |
 
|---------------------------------|------------------|---------------------------------------|--------|------|
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/s3.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/s3.md
index 035808ef054..2ae93222f51 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/s3.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/s3.md
@@ -32,6 +32,7 @@ under the License.
 - Outfile 属性。
 
 ## 参数总览
+
 | 属性名称                     | 曾用名         | 描述                                  
         | 默认值        | 是否必须 |
 
|------------------------------|----------------|------------------------------------------------|--------------|----------|
 | `s3.endpoint`                 | S3 endpoint    | S3 endpoint                 
                   |              | 是       |
@@ -44,11 +45,13 @@ under the License.
 | `s3.connection.timeout`       |                | S3 连接超时时间,单位毫秒              
       | `1000`       | 否       |
 
 ### 认证配置
+
 访问 AWS S3 时,需要提供 AWS Access Key 和 AWS Secret Key, 即下列参数:
 - s3.access_key
 - s3.secret_key
 
 ### 示例配置
+
 ```properties
 "s3.access_key" = "ak"
 "s3.secret_key" = "sk"
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/tencent-cos.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/tencent-cos.md
index 566033d4643..b1475e10fb4 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/tencent-cos.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/tencent-cos.md
@@ -24,8 +24,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## 腾讯云 COS 访问参数
-
 本文档介绍访问腾讯云 COS 所需的参数,这些参数适用于以下场景:
 
 - Catalog 属性


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to