This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 62acb0d3781 [opt](hdfs) add hdfs client tool (#2613)
62acb0d3781 is described below

commit 62acb0d37811f1d18eb0da3d2a4f3b337b048e30
Author: Mingyu Chen (Rayner) <[email protected]>
AuthorDate: Thu Jul 10 16:13:30 2025 -0700

    [opt](hdfs) add hdfs client tool (#2613)
    
    ## Versions
    
    - [x] dev
    - [ ] 3.0
    - [ ] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 docs/lakehouse/storages/hdfs.md                       | 17 +++++++++++++++++
 .../current/lakehouse/storages/hdfs.md                | 19 ++++++++++++++++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/docs/lakehouse/storages/hdfs.md b/docs/lakehouse/storages/hdfs.md
index 5128ba53fb9..54cebc0705d 100644
--- a/docs/lakehouse/storages/hdfs.md
+++ b/docs/lakehouse/storages/hdfs.md
@@ -137,3 +137,20 @@ Note:
 
 In summary, properly configuring `dfs.client.socket-timeout` can improve I/O 
response time while ensuring system stability and reliability.
 
+## Debugging HDFS
+
+Hadoop environment configuration is complex, and in some cases, connectivity 
issues or poor access performance may occur. Here are some third-party tools to 
help users quickly troubleshoot connectivity issues and basic performance 
problems.
+
+### HDFS Client
+
+- Java: 
[https://github.com/morningman/hdfs-client-java](https://github.com/morningman/hdfs-client-java)
+
+- CPP: 
[https://github.com/morningman/hdfs-client-cpp](https://github.com/morningman/hdfs-client-cpp)
+
+These two tools can be used to quickly verify HDFS connectivity and read 
performance. Most of the Hadoop dependencies in these tools are the same as 
Doris's own Hadoop dependencies, so they can simulate Doris's access to HDFS 
scenarios to the greatest extent.
+
+The Java version accesses HDFS through Java, which can simulate the logic of 
Doris FE side accessing HDFS.
+
+The CPP version accesses HDFS through C++ & libhdfs, which can simulate the 
logic of Doris BE side accessing HDFS.
+
+For specific usage instructions, please refer to the README of each.
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/hdfs.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/hdfs.md
index 74eb542d43b..2898e7db4ac 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/hdfs.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/storages/hdfs.md
@@ -122,7 +122,7 @@ create catalog regression properties (
 
 注意,这里的值是单个 HDFS Client 的累计值,而不是单个查询的数值。同一个 HDFS Client 会被多个查询复用。
 
-### `dfs.client.socket-timeout`
+### dfs.client.socket-timeout
 
 `dfs.client.socket-timeout` 是 Hadoop HDFS 中的一个客户端配置参数,用于设置客户端与 DataNode 或 
NameNode 之间建立连接或读取数据时的套接字(socket)超时时间,单位为毫秒。该参数的默认值通常为 60,000 毫秒。
 
@@ -136,3 +136,20 @@ create catalog regression properties (
 
 总之,合理配置 `dfs.client.socket-timeout` 参数,可以在提高 I/O 响应速度的同时,确保系统的稳定性和可靠性。
 
+## 调试 HDFS
+
+Hadoop 环境配置复杂,某些情况下可能出现无法连通、访问性能不佳等问题。这里提供一些第三方工具帮助用户快速排查连通性问题和基础的性能问题。
+
+### HDFS Clinet
+
+- 
Java:[https://github.com/morningman/hdfs-client-java](https://github.com/morningman/hdfs-client-java)
+
+- CPP: 
[https://github.com/morningman/hdfs-client-cpp](https://github.com/morningman/hdfs-client-cpp)
+
+这两个工具可以用于快速验证 HDFS 连通性和读取性能。其中的大部分 Hadoop 依赖项和 Doris 本身的 Hadoop 
依赖相同,因此可以最大程度模拟 Doris 访问 HDFS 的场景。
+
+Java 版本使用通过 Java 访问 HDFS,可以模拟 Doris FE 侧访问 HDFS 的逻辑。
+
+CPP 版本通过 C++ 调用 libhdfs 访问 HDFS,可以模拟 Doris BE 侧访问 HDFS 的逻辑。
+
+具体使用方式可以各自代码库的 README。
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to