This is an automated email from the ASF dual-hosted git repository. luzhijing pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 095f71b0de [doc][arrow-flight-sql-connect]Update and Correct Arrow Flight SQL Documentation (#822) 095f71b0de is described below commit 095f71b0de4e3d281911718e54878944c792d948 Author: bingquanzhao <bingquan_z...@icloud.com> AuthorDate: Thu Jul 11 00:12:15 2024 +0800 [doc][arrow-flight-sql-connect]Update and Correct Arrow Flight SQL Documentation (#822) Add supplementary content and make corrections to the Arrow Flight SQL documentation. --------- Co-authored-by: Luzhijing <82810928+luzhij...@users.noreply.github.com> --- ...in-apache-doris-for-10x-faster-data-transfer.md | 6 ++--- docs/db-connect/arrow-flight-sql-connect.md | 17 +++++++++++-- .../current/db-connect/arrow-flight-sql-connect.md | 28 +++++++++++++++------- .../db-connect/arrow-flight-sql-connect.md | 24 ++++++++++++++----- .../db-connect/arrow-flight-sql-connect.md | 17 +++++++++++-- 5 files changed, 71 insertions(+), 21 deletions(-) diff --git a/blog/arrow-flight-sql-in-apache-doris-for-10x-faster-data-transfer.md b/blog/arrow-flight-sql-in-apache-doris-for-10x-faster-data-transfer.md index 01452b5981..5e985eed75 100644 --- a/blog/arrow-flight-sql-in-apache-doris-for-10x-faster-data-transfer.md +++ b/blog/arrow-flight-sql-in-apache-doris-for-10x-faster-data-transfer.md @@ -92,7 +92,7 @@ Configure parameters for Doris frontend (FE) and backend (BE): - In `fe/conf/fe.conf`, set `arrow_flight_sql_port ` to an available port, such as 9090. -- In `be/conf/be.conf`, set `arrow_flight_port ` to an available port, such as 9091. +- In `be/conf/be.conf`, set `arrow_flight_sql_port ` to an available port, such as 9091. Suppose that the Arrow Flight SQL services for the Doris instance will run on ports 9090 and 9091 for FE and BE respectively, and the Doris username/password is "user" and "pass", the connection process would be: @@ -245,7 +245,7 @@ import adbc_driver_flightsql.dbapi as flight_sql # step 2, create a client that interacts with the Doris Arrow Flight SQL service. # Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090. -# Modify arrow_flight_port in be/conf/be.conf to an available port, such as 9091. +# Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091. conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "root", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "", @@ -313,7 +313,7 @@ import adbc_driver_flightsql.dbapi as flight_sql import pandas from datetime import datetime -my_uri = "grpc://0.0.0.0:`fe.conf_arrow_flight_port`" +my_uri = "grpc://0.0.0.0:`fe.conf_arrow_flight_sql_port`" my_db_kwargs = { adbc_driver_manager.DatabaseOptions.USERNAME.value: "root", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "", diff --git a/docs/db-connect/arrow-flight-sql-connect.md b/docs/db-connect/arrow-flight-sql-connect.md index 25c6e33b5f..31e391e575 100644 --- a/docs/db-connect/arrow-flight-sql-connect.md +++ b/docs/db-connect/arrow-flight-sql-connect.md @@ -66,7 +66,9 @@ Create a client to interact with the Doris Arrow Flight SQL service. You need to Modify the configuration parameters of Doris FE and BE: - Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090. -- Modify arrow_flight_port in be/conf/be.conf to an available port, such as 9091. +- Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091. + +`Note: arrow_flight_sql_port configured in fe.conf and be.conf are different` Assuming that the Arrow Flight SQL services of FE and BE in the Doris instance will run on ports 9090 and 9091 respectively, and the Doris username/password is "user"/"pass", the connection process is as follows: @@ -219,7 +221,7 @@ import adbc_driver_flightsql.dbapi as flight_sql # step 2, create a client that interacts with the Doris Arrow Flight SQL service. # Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090. -# Modify arrow_flight_port in be/conf/be.conf to an available port, such as 9091. +# Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091. conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "root", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "", @@ -391,6 +393,17 @@ connection.close(); ### JDBC Driver +When using Java 9 or later, some JDK internals must be exposed by adding --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED to the java command: + +```shell +# Directly on the command line +$ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED -jar ... +# Indirectly via environment variables +$ env _JAVA_OPTIONS="--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ... +``` + +Otherwise, you may see errors like `module java.base does not "opens java.nio" to unnamed module` or `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` + The connection code example is as follows: ```Java diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/db-connect/arrow-flight-sql-connect.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/db-connect/arrow-flight-sql-connect.md index 5b6e2a5a68..a5991553a7 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/db-connect/arrow-flight-sql-connect.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/db-connect/arrow-flight-sql-connect.md @@ -32,13 +32,13 @@ Doris 基于 Arrow Flight SQL 协议实现了高速数据链路,支持多种 ## 用途 -从 Doris 加载大批量数据到其他组件,如 Python/Java/Spark/Flink,可以使用基于 Arrow Flight SQL 的 ADBC/JDBC 替代过去的 JDBC/Pymysql/Pandas 来获得更高的读取性能,这在数据科学、数据湖分析等场景中经常遇到。 +从 Doris 加载大批量数据到其他组件,如 Python/Java/Spark/Flink,可以使用基于 Arrow Flight SQL 的 ADBC/JDBC 替代过去的 JDBC/PyMySQL/Pandas 来获得更高的读取性能,这在数据科学、数据湖分析等场景中经常遇到。 Apache Arrow Flight SQL 是一个由 Apache Arrow 社区开发的与数据库系统交互的协议,用于 ADBC 客户端使用 Arrow 数据格式与实现了 Arrow Flight SQL 协议的数据库交互,具有 Arrow Flight 的速度优势以及 JDBC/ODBC 的易用性。 -Doris 支持 Arrow Flight SQL 的动机、设计与实现、性能测试结果、以及有关 Arrow Flight、ADBC的更多概念可以看 [GitHub Issue](https://github.com/apache/doris/issues/25514),这篇文档主要介绍 Doris Arrow Flight SQL 的使用方法,以及一些常见问题。 +Doris 支持 Arrow Flight SQL 的动机、设计与实现、性能测试结果、以及有关 Arrow Flight、ADBC 的更多概念可以看 [GitHub Issue](https://github.com/apache/doris/issues/25514),这篇文档主要介绍 Doris Arrow Flight SQL 的使用方法,以及一些常见问题。 -安装Apache Arrow 你可以去官方文档( +安装 Apache Arrow 你可以去官方文档( [Apache Arrow](https://arrow.apache.org/install/))找到详细的安装教程。 ## Python 使用方法 @@ -67,7 +67,9 @@ import adbc_driver_flightsql.dbapi as flight_sql 修改 Doris FE 和 BE 的配置参数: - 修改fe/conf/fe.conf 中 arrow_flight_sql_port 为一个可用端口,如 9090。 -- 修改 be/conf/be.conf中 arrow_flight_port 为一个可用端口,如 9091。 +- 修改 be/conf/be.conf中 arrow_flight_sql_port 为一个可用端口,如 9091。 + +`注: fe.conf 与 be.conf 中配置的 arrow_flight_sql_port 不相同` 假设 Doris 实例中 FE 和 BE 的 Arrow Flight SQL 服务将分别在端口 9090 和 9091 上运行,且 Doris 用户名/密码为“user”/“pass”,那么连接过程如下所示: @@ -220,7 +222,7 @@ import adbc_driver_flightsql.dbapi as flight_sql # step 2, create a client that interacts with the Doris Arrow Flight SQL service. # Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090. -# Modify arrow_flight_port in be/conf/be.conf to an available port, such as 9091. +# Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091. conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "root", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "", @@ -273,9 +275,9 @@ execute("select k5, sum(k1), count(1), avg(k3) from arrow_flight_sql_test group cursor.close() ``` -## Jdbc Connector with Arrow Flight SQL +## JDBC Connector with Arrow Flight SQL -Arrow Flight SQL 协议的开源 JDBC 驱动兼容标准的 JDBC API,可用于大多数 BI 工具通过 JDBC 访问 Doris,并支持高速传输 Apache Arrow 数据。使用方法与通过 MySQL 协议的 JDBC 驱动连接 Doris 类似,只需将链接 URL 中的 jdbc:mysql 协议换成 jdbc:arrow-flight-sql协议,查询返回的结果依然是 JDBC 的 ResultSet 数据结构。 +Arrow Flight SQL 协议的开源 JDBC 驱动兼容标准的 JDBC API,可用于大多数 BI 工具通过 JDBC 访问 Doris,并支持高速传输 Apache Arrow 数据。使用方法与通过 MySQL 协议的 JDBC 驱动连接 Doris 类似,只需将链接 URL 中的 jdbc:mysql 协议换成 jdbc:arrow-flight-sql 协议,查询返回的结果依然是 JDBC 的 ResultSet 数据结构。 POM dependency: ```Java @@ -291,6 +293,16 @@ POM dependency: </dependencies> ``` +使用 Java 9 或更高版本时,必须通过在 Java 命令中添加 --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED 来暴露某些 JDK 内部结构: + +```shell +# Directly on the command line +$ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED -jar ... +# Indirectly via environment variables +$ env _JAVA_OPTIONS="--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ... +``` +否则,您可能会看到一些错误,如 `module java.base does not "opens java.nio" to unnamed module` 或者 `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` + 连接代码示例如下: ```Java @@ -319,7 +331,7 @@ conn.close(); ## Java 使用方法 -除了使用 JDBC,与 Python 类似,JAVA 也可以创建 Driver 读取 Doris 并返回 Arrow 格式的数据,下面分别是使用 AdbcDriver 和 JdbcDriver 连接 Doris Arrow Flight Server。 +除了使用 JDBC,与 Python 类似,Java 也可以创建 Driver 读取 Doris 并返回 Arrow 格式的数据,下面分别是使用 AdbcDriver 和 JdbcDriver 连接 Doris Arrow Flight Server。 POM dependency: ```Java diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/db-connect/arrow-flight-sql-connect.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/db-connect/arrow-flight-sql-connect.md index 84260efc2f..3f94f432d0 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/db-connect/arrow-flight-sql-connect.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/db-connect/arrow-flight-sql-connect.md @@ -32,14 +32,14 @@ Doris 基于 Arrow Flight SQL 协议实现了高速数据链路,支持多种 ## 用途 -从 Doris 加载大批量数据到其他组件,如 Python/Java/Spark/Flink,可以使用基于 Arrow Flight SQL 的 ADBC/JDBC 替代过去的 JDBC/Pymysql/Pandas 来获得更高的读取性能,这在数据科学、数据湖分析等场景中经常遇到。 +从 Doris 加载大批量数据到其他组件,如 Python/Java/Spark/Flink,可以使用基于 Arrow Flight SQL 的 ADBC/JDBC 替代过去的 JDBC/PyMySQL/Pandas 来获得更高的读取性能,这在数据科学、数据湖分析等场景中经常遇到。 Apache Arrow Flight SQL 是一个由 Apache Arrow 社区开发的与数据库系统交互的协议,用于 ADBC 客户端使用 Arrow 数据格式与实现了 Arrow Flight SQL 协议的数据库交互,具有 Arrow Flight 的速度优势以及 JDBC/ODBC 的易用性。 -Doris 支持 Arrow Flight SQL 的动机、设计与实现、性能测试结果、以及有关 Arrow Flight、ADBC的更多概念可以看:[GitHub Issue](https://github.com/apache/doris/issues/25514),这篇文档主要介绍 Doris Arrow Flight SQL 的使用方法,以及一些常见问题。 +Doris 支持 Arrow Flight SQL 的动机、设计与实现、性能测试结果、以及有关 Arrow Flight、ADBC 的更多概念可以看:[GitHub Issue](https://github.com/apache/doris/issues/25514),这篇文档主要介绍 Doris Arrow Flight SQL 的使用方法,以及一些常见问题。 安装Apache Arrow 你可以去官方文档( -[Apache Arrow](https://arrow.apache.org/install/))找到详细的安装教程 +[Apache Arrow](https://arrow.apache.org/install/))找到详细的安装教程。 ## Python 使用方法 @@ -67,7 +67,9 @@ import adbc_driver_flightsql.dbapi as flight_sql 修改 Doris FE 和 BE 的配置参数: - 修改fe/conf/fe.conf 中 arrow_flight_sql_port 为一个可用端口,如 9090。 -- 修改 be/conf/be.conf中 arrow_flight_port 为一个可用端口,如 9091。 +- 修改 be/conf/be.conf中 arrow_flight_sql_port 为一个可用端口,如 9091。 + +`注: fe.conf 与 be.conf 中配置的 arrow_flight_sql_port 不相同` 假设 Doris 实例中 FE 和 BE 的 Arrow Flight SQL 服务将分别在端口 9090 和 9091 上运行,且 Doris 用户名/密码为“user”/“pass”,那么连接过程如下所示: @@ -220,7 +222,7 @@ import adbc_driver_flightsql.dbapi as flight_sql # step 2, create a client that interacts with the Doris Arrow Flight SQL service. # Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090. -# Modify arrow_flight_port in be/conf/be.conf to an available port, such as 9091. +# Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091. conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "root", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "", @@ -291,6 +293,16 @@ POM dependency: </dependencies> ``` +使用 Java 9 或更高版本时,必须通过在 Java 命令中添加 --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED 来暴露某些 JDK 内部结构: + +```shell +# Directly on the command line +$ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED -jar ... +# Indirectly via environment variables +$ env _JAVA_OPTIONS="--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ... +``` +否则,您可能会看到一些错误,如 `module java.base does not "opens java.nio" to unnamed module` 或者 `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` + 连接代码示例如下: ```Java @@ -319,7 +331,7 @@ conn.close(); ## Java 使用方法 -除了使用 JDBC,与 Python 类似,JAVA 也可以创建 Driver 读取 Doris 并返回 Arrow 格式的数据,下面分别是使用 AdbcDriver 和 JdbcDriver 连接 Doris Arrow Flight Server。 +除了使用 JDBC,与 Python 类似,Java 也可以创建 Driver 读取 Doris 并返回 Arrow 格式的数据,下面分别是使用 AdbcDriver 和 JdbcDriver 连接 Doris Arrow Flight Server。 POM dependency: ```Java diff --git a/versioned_docs/version-2.1/db-connect/arrow-flight-sql-connect.md b/versioned_docs/version-2.1/db-connect/arrow-flight-sql-connect.md index 003a7500b4..8f226fc9db 100644 --- a/versioned_docs/version-2.1/db-connect/arrow-flight-sql-connect.md +++ b/versioned_docs/version-2.1/db-connect/arrow-flight-sql-connect.md @@ -66,7 +66,9 @@ Create a client to interact with the Doris Arrow Flight SQL service. You need to Modify the configuration parameters of Doris FE and BE: - Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090. -- Modify arrow_flight_port in be/conf/be.conf to an available port, such as 9091. +- Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091. + +`Note: arrow_flight_sql_port configured in fe.conf and be.conf are different` Assuming that the Arrow Flight SQL services of FE and BE in the Doris instance will run on ports 9090 and 9091 respectively, and the Doris username/password is "user"/"pass", the connection process is as follows: @@ -219,7 +221,7 @@ import adbc_driver_flightsql.dbapi as flight_sql # step 2, create a client that interacts with the Doris Arrow Flight SQL service. # Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090. -# Modify arrow_flight_port in be/conf/be.conf to an available port, such as 9091. +# Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091. conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "root", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "", @@ -391,6 +393,17 @@ connection.close(); ### JDBC Driver +When using Java 9 or later, some JDK internals must be exposed by adding --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED to the java command: + +```shell +# Directly on the command line +$ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED -jar ... +# Indirectly via environment variables +$ env _JAVA_OPTIONS="--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ... +``` + +Otherwise, you may see errors like `module java.base does not "opens java.nio" to unnamed module` or `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` + The connection code example is as follows: ```Java --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org