This is an automated email from the ASF dual-hosted git repository. yiguolei pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 9831b94961 [doc] Fix Arrow Flight docs (#982) 9831b94961 is described below commit 9831b949611715e328c718b1d3cfc4210f7402a0 Author: Xinyi Zou <zouxiny...@gmail.com> AuthorDate: Sat Aug 17 18:10:52 2024 +0800 [doc] Fix Arrow Flight docs (#982) --- ...in-apache-doris-for-10x-faster-data-transfer.md | 11 ++++-- blog/release-note-2.1.0.md | 2 +- common_docs_zh/releasenotes/v2.1/release-2.1.0.md | 2 +- docs/db-connect/arrow-flight-sql-connect.md | 31 ++++++++++++---- .../current/db-connect/arrow-flight-sql-connect.md | 32 +++++++++++++---- .../db-connect/arrow-flight-sql-connect.md | 42 +++++++++++++++------- .../db-connect/arrow-flight-sql-connect.md | 32 +++++++++++++---- .../db-connect/arrow-flight-sql-connect.md | 38 +++++++++++++++----- .../db-connect/arrow-flight-sql-connect.md | 31 ++++++++++++---- 9 files changed, 168 insertions(+), 53 deletions(-) diff --git a/blog/arrow-flight-sql-in-apache-doris-for-10x-faster-data-transfer.md b/blog/arrow-flight-sql-in-apache-doris-for-10x-faster-data-transfer.md index 5e985eed75..be6154f434 100644 --- a/blog/arrow-flight-sql-in-apache-doris-for-10x-faster-data-transfer.md +++ b/blog/arrow-flight-sql-in-apache-doris-for-10x-faster-data-transfer.md @@ -82,6 +82,11 @@ Import the following module/library to interact with the installed library: ```Python import adbc_driver_manager import adbc_driver_flightsql.dbapi as flight_sql + +>>> print(adbc_driver_manager.__version__) +1.1.0 +>>> print(adbc_driver_flightsql.__version__) +1.1.0 ``` ### 02 Connect to Doris @@ -97,7 +102,7 @@ Configure parameters for Doris frontend (FE) and backend (BE): Suppose that the Arrow Flight SQL services for the Doris instance will run on ports 9090 and 9091 for FE and BE respectively, and the Doris username/password is "user" and "pass", the connection process would be: ```C++ -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "user", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "pass", }) @@ -246,7 +251,7 @@ import adbc_driver_flightsql.dbapi as flight_sql # step 2, create a client that interacts with the Doris Arrow Flight SQL service. # Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090. # Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091. -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "root", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "", }) @@ -401,7 +406,7 @@ import java.sql.ResultSet; import java.sql.Statement; Class.forName("org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver"); -String DB_URL = "jdbc:arrow-flight-sql://0.0.0.0:9090?useServerPrepStmts=false" +String DB_URL = "jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}?useServerPrepStmts=false" + "&cachePrepStmts=true&useSSL=false&useEncryption=false"; String USER = "root"; String PASS = ""; diff --git a/blog/release-note-2.1.0.md b/blog/release-note-2.1.0.md index 84af6b2678..7fee163d1e 100644 --- a/blog/release-note-2.1.0.md +++ b/blog/release-note-2.1.0.md @@ -165,7 +165,7 @@ Now this is revolutionized in Doris V2.1, where we provide a high-throughput dat This allows fast data access to Apache Doris by data science tools like Pandas and Numpy, which means Apache Doris can be seamlessly integrated with the entire AI and data science ecosystem. This unveils a future of endless possibilities. ```C++ -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "user", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "pass", }) diff --git a/common_docs_zh/releasenotes/v2.1/release-2.1.0.md b/common_docs_zh/releasenotes/v2.1/release-2.1.0.md index 7fc3a26520..f97361fbf0 100644 --- a/common_docs_zh/releasenotes/v2.1/release-2.1.0.md +++ b/common_docs_zh/releasenotes/v2.1/release-2.1.0.md @@ -161,7 +161,7 @@ under the License. 基于此,Apache Doris 可以与整个 AI 和数据科学生态进行良好的整合,这也是未来的重要发展方向。 ```C++ -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "user", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "pass", }) diff --git a/docs/db-connect/arrow-flight-sql-connect.md b/docs/db-connect/arrow-flight-sql-connect.md index 8ca606ab65..0c9e326f5f 100644 --- a/docs/db-connect/arrow-flight-sql-connect.md +++ b/docs/db-connect/arrow-flight-sql-connect.md @@ -58,6 +58,11 @@ Import the following modules/libraries in the code to use the installed Library: ```Python import adbc_driver_manager import adbc_driver_flightsql.dbapi as flight_sql + +>>> print(adbc_driver_manager.__version__) +1.1.0 +>>> print(adbc_driver_flightsql.__version__) +1.1.0 ``` ### Connect to Doris @@ -73,7 +78,7 @@ Modify the configuration parameters of Doris FE and BE: Assuming that the Arrow Flight SQL services of FE and BE in the Doris instance will run on ports 9090 and 9091 respectively, and the Doris username/password is "user"/"pass", the connection process is as follows: ```Python -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "user", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "pass", }) @@ -222,7 +227,7 @@ import adbc_driver_flightsql.dbapi as flight_sql # step 2, create a client that interacts with the Doris Arrow Flight SQL service. # Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090. # Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091. -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "root", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "", }) @@ -301,7 +306,7 @@ import java.sql.ResultSet; import java.sql.Statement; Class.forName("org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver"); -String DB_URL = "jdbc:arrow-flight-sql://0.0.0.0:9090?useServerPrepStmts=false" +String DB_URL = "jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}?useServerPrepStmts=false" + "&cachePrepStmts=true&useSSL=false&useEncryption=false"; String USER = "root"; String PASS = ""; @@ -366,7 +371,7 @@ The connection code example is as follows: final BufferAllocator allocator = new RootAllocator(); FlightSqlDriver driver = new FlightSqlDriver(allocator); Map<String, Object> parameters = new HashMap<>(); -AdbcDriver.PARAM_URI.set(parameters, Location.forGrpcInsecure("0.0.0.0", 9090).getUri().toString()); +AdbcDriver.PARAM_URI.set(parameters, Location.forGrpcInsecure("{FE_HOST}", {fe.conf:arrow_flight_sql_port}).getUri().toString()); AdbcDriver.PARAM_USERNAME.set(parameters, "root"); AdbcDriver.PARAM_PASSWORD.set(parameters, ""); AdbcDatabase adbcDatabase = driver.open(parameters); @@ -407,14 +412,18 @@ $ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED - $ env _JAVA_OPTIONS="--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ... ``` -Otherwise, you may see errors like `module java.base does not "opens java.nio" to unnamed module` or `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` +Otherwise, you may see some errors such as `module java.base does not "opens java.nio" to unnamed module` or `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` or `ava.lang.NoClassDefFoundError: Could not initialize class org.apache.arrow.memory.util.MemoryUtil (Internal; Prepare)` + +If you debug in IntelliJ IDEA, you need to add `--add-opens=java.base/java.nio=ALL-UNNAMED` in `Build and run` of `Run/Debug Configurations`, refer to the picture below: + + The connection code example is as follows: ```Java final Map<String, Object> parameters = new HashMap<>(); AdbcDriver.PARAM_URI.set( - parameters,"jdbc:arrow-flight-sql://0.0.0.0:9090?useServerPrepStmts=false&cachePrepStmts=true&useSSL=false&useEncryption=false"); + parameters,"jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}?useServerPrepStmts=false&cachePrepStmts=true&useSSL=false&useEncryption=false"); AdbcDriver.PARAM_USERNAME.set(parameters, "root"); AdbcDriver.PARAM_PASSWORD.set(parameters, ""); try ( @@ -479,4 +488,12 @@ The Linux kernel version of kylinv10 SP2 and SP3 is only up to 4.19.90-24.4.v210 4. ADBC v0.10, JDBC and Java ADBC/JDBCDriver do not support parallel reading, and the `stmt.executePartitioned()` method is not implemented. You can only use the native FlightClient to implement parallel reading of multiple Endpoints, using the method `sqlClient=new FlightSqlClient, execute=sqlClient.execute(sql), endpoints=execute.getEndpoints(), for(FlightEndpoint endpoint: endpoints)`. In addition, the default AdbcStatement of ADBC V0.10 is actually JdbcStatement. After executeQue [...] -5. As of Arrow v15.0, Arrow JDBC Connector does not support specifying the database name in the URL. For example, `jdbc:arrow-flight-sql://0.0.0.0:9090/test?useServerPrepStmts=false` specifies that the connection to the `test` database is invalid. You can only execute the SQL `use database` manually. +5. As of Arrow v15.0, Arrow JDBC Connector does not support specifying the database name in the URL. For example, `jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}/test?useServerPrepStmts=false` specifies that the connection to the `test` database is invalid. You can only execute the SQL `use database` manually. + +6. There is a bug in Doris 2.1.4 version. There is a chance of error when reading large amounts of data. This bug is fixed in [Fix arrow flight result sink #36827](https://github.com/apache/doris/pull/36827) PR. Upgrading Doris 2.1.5 version can solve this problem. For details of the problem, see: [Questions](https://ask.selectdb.com/questions/D1Ia1/arrow-flight-sql-shi-yong-python-de-adbc-driver-lian-jie-doris-zhi-xing-cha-xun-sql-du-qu-bu-dao-shu-ju) + +7. `Warning: Cannot disable autocommit; conn will not be DB-API 2.0 compliant` Ignore this warning when using Python. This is a problem with the Python ADBC Client and will not affect the query. + +8. Python reports an error `grpc: received message larger than max (20748753 vs. 16777216)`. Refer to [Python: grpc: received message larger than max (20748753 vs. 16777216) #2078](https://github.com/apache/arrow-adbc/issues/2078) to add `adbc_driver_flightsql.DatabaseOptions.WITH_MAX_MSG_SIZE.value` in Database Option. + +9. Before Doris version 2.1.7, the error `Reach limit of connections` is reported. This is because there is no limit on the number of Arrow Flight connections for a single user, which is less than `max_user_connections` in `UserProperty`, which is 100 by default. You can modify the current maximum number of connections for Billie user to 100 by `SET PROPERTY FOR 'Billie' 'max_user_connections' = '1000';`, or add `arrow_flight_token_cache_size=50` in `fe.conf` to limit the overall number [...] diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/db-connect/arrow-flight-sql-connect.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/db-connect/arrow-flight-sql-connect.md index 0615f7760c..febbb31207 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/db-connect/arrow-flight-sql-connect.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/db-connect/arrow-flight-sql-connect.md @@ -59,6 +59,11 @@ pip install adbc_driver_flightsql ```Python import adbc_driver_manager import adbc_driver_flightsql.dbapi as flight_sql + +>>> print(adbc_driver_manager.__version__) +1.1.0 +>>> print(adbc_driver_flightsql.__version__) +1.1.0 ``` ### 连接 Doris @@ -74,7 +79,7 @@ import adbc_driver_flightsql.dbapi as flight_sql 假设 Doris 实例中 FE 和 BE 的 Arrow Flight SQL 服务将分别在端口 9090 和 9091 上运行,且 Doris 用户名/密码为“user”/“pass”,那么连接过程如下所示: ```Python -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "user", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "pass", }) @@ -223,7 +228,7 @@ import adbc_driver_flightsql.dbapi as flight_sql # step 2, create a client that interacts with the Doris Arrow Flight SQL service. # Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090. # Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091. -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "root", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "", }) @@ -301,7 +306,12 @@ $ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED - # Indirectly via environment variables $ env _JAVA_OPTIONS="--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ... ``` -否则,您可能会看到一些错误,如 `module java.base does not "opens java.nio" to unnamed module` 或者 `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` + +否则,您可能会看到一些错误,如 `module java.base does not "opens java.nio" to unnamed module` 或者 `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` 或者 `ava.lang.NoClassDefFoundError: Could not initialize class org.apache.arrow.memory.util.MemoryUtil (Internal; Prepare)` + +如果您在 IntelliJ IDEA 中调试,需要在 `Run/Debug Configurations` 的 `Build and run` 中增加 `--add-opens=java.base/java.nio=ALL-UNNAMED`,参照下面的图片: + + 连接代码示例如下: @@ -312,7 +322,7 @@ import java.sql.ResultSet; import java.sql.Statement; Class.forName("org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver"); -String DB_URL = "jdbc:arrow-flight-sql://0.0.0.0:9090?useServerPrepStmts=false" +String DB_URL = "jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}?useServerPrepStmts=false" + "&cachePrepStmts=true&useSSL=false&useEncryption=false"; String USER = "root"; String PASS = ""; @@ -377,7 +387,7 @@ POM dependency: final BufferAllocator allocator = new RootAllocator(); FlightSqlDriver driver = new FlightSqlDriver(allocator); Map<String, Object> parameters = new HashMap<>(); -AdbcDriver.PARAM_URI.set(parameters, Location.forGrpcInsecure("0.0.0.0", 9090).getUri().toString()); +AdbcDriver.PARAM_URI.set(parameters, Location.forGrpcInsecure("{FE_HOST}", {fe.conf:arrow_flight_sql_port}).getUri().toString()); AdbcDriver.PARAM_USERNAME.set(parameters, "root"); AdbcDriver.PARAM_PASSWORD.set(parameters, ""); AdbcDatabase adbcDatabase = driver.open(parameters); @@ -414,7 +424,7 @@ connection.close(); ```Java final Map<String, Object> parameters = new HashMap<>(); AdbcDriver.PARAM_URI.set( - parameters,"jdbc:arrow-flight-sql://0.0.0.0:9090?useServerPrepStmts=false&cachePrepStmts=true&useSSL=false&useEncryption=false"); + parameters,"jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}?useServerPrepStmts=false&cachePrepStmts=true&useSSL=false&useEncryption=false"); AdbcDriver.PARAM_USERNAME.set(parameters, "root"); AdbcDriver.PARAM_PASSWORD.set(parameters, ""); try ( @@ -479,4 +489,12 @@ kylinv10 SP2 和 SP3 的 Linux 内核版本最高只有 4.19.90-24.4.v2101.ky10. 4. ADBC v0.10,JDBC 和 Java ADBC/JDBCDriver 还不支持并行读取,没有实现`stmt.executePartitioned()`这个方法,只能使用原生的 FlightClient 实现并行读取多个 Endpoints, 使用方法`sqlClient=new FlightSqlClient, execute=sqlClient.execute(sql), endpoints=execute.getEndpoints(), for(FlightEndpoint endpoint: endpoints)`,此外,ADBC V0.10 默认的AdbcStatement实际是JdbcStatement,executeQuery后将行存格式的 JDBC ResultSet 又重新转成的Arrow列存格式,预期到 ADBC 1.0.0 时 Java ADBC 将功能完善 [GitHub Issue](https://github.com/apache/arrow-adbc/issues/1490)。 -5. 截止Arrow v15.0,Arrow JDBC Connector 不支持在 URL 中指定 database name,比如 `jdbc:arrow-flight-sql://0.0.0.0:9090/test?useServerPrepStmts=false` 中指定连接`test` database无效,只能手动执行SQL `use database`。 +5. 截止Arrow v15.0,Arrow JDBC Connector 不支持在 URL 中指定 database name,比如 `jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}/test?useServerPrepStmts=false` 中指定连接`test` database无效,只能手动执行SQL `use database`。 + +6. Doris 2.1.4 version 存在一个Bug,读取大数据量时有几率报错,在 [Fix arrow flight result sink #36827](https://github.com/apache/doris/pull/36827) 修复,升级 Doris 2.1.5 version 可以解决。问题详情见:[Questions](https://ask.selectdb.com/questions/D1Ia1/arrow-flight-sql-shi-yong-python-de-adbc-driver-lian-jie-doris-zhi-xing-cha-xun-sql-du-qu-bu-dao-shu-ju) + +7. `Warning: Cannot disable autocommit; conn will not be DB-API 2.0 compliant` 使用 Python 时忽略这个 Warning,这是 Python ADBC Client 的问题,这不会影响查询。 + +8. Python 报错 `grpc: received message larger than max (20748753 vs. 16777216)`,参考 [Python: grpc: received message larger than max (20748753 vs. 16777216) #2078](https://github.com/apache/arrow-adbc/issues/2078) 在 Database Option 中增加 `adbc_driver_flightsql.DatabaseOptions.WITH_MAX_MSG_SIZE.value`. + +9. Doris version 2.1.7 版本之前,报错 `Reach limit of connections`,这是因为没有限制单个用户的 Arrow Flight 连接数小于 `UserProperty` 中的 `max_user_connections`,默认100,可以通过 `SET PROPERTY FOR 'Billie' 'max_user_connections' = '1000';` 修改 Billie 用户的当前最大连接数到 100,或者在 `fe.conf` 中增加 `arrow_flight_token_cache_size=50` 来限制整体的 Arrow Flight 连接数。Doris version 2.1.7 版本之前 Arrow Flight 连接默认 3天 超时断开,只强制连接数小于 `qe_max_connection/2`,超过时依据lru淘汰,`qe_max_connection` 是fe所有用户的总连接数,默认1024。具体可以看 `arrow_flight_token_cache_size` 这个conf的介绍。在 [...] diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/db-connect/arrow-flight-sql-connect.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/db-connect/arrow-flight-sql-connect.md index 5949d35b0c..e1091e9519 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/db-connect/arrow-flight-sql-connect.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/db-connect/arrow-flight-sql-connect.md @@ -36,10 +36,10 @@ Doris 基于 Arrow Flight SQL 协议实现了高速数据链路,支持多种 Apache Arrow Flight SQL 是一个由 Apache Arrow 社区开发的与数据库系统交互的协议,用于 ADBC 客户端使用 Arrow 数据格式与实现了 Arrow Flight SQL 协议的数据库交互,具有 Arrow Flight 的速度优势以及 JDBC/ODBC 的易用性。 -Doris 支持 Arrow Flight SQL 的动机、设计与实现、性能测试结果、以及有关 Arrow Flight、ADBC 的更多概念可以看:[GitHub Issue](https://github.com/apache/doris/issues/25514),这篇文档主要介绍 Doris Arrow Flight SQL 的使用方法,以及一些常见问题。 +Doris 支持 Arrow Flight SQL 的动机、设计与实现、性能测试结果、以及有关 Arrow Flight、ADBC 的更多概念可以看 [GitHub Issue](https://github.com/apache/doris/issues/25514),这篇文档主要介绍 Doris Arrow Flight SQL 的使用方法,以及一些常见问题。 -安装Apache Arrow 你可以去官方文档( -[Apache Arrow](https://arrow.apache.org/install/))找到详细的安装教程。 +安装 Apache Arrow 你可以去官方文档( +[Apache Arrow](https://arrow.apache.org/install/))找到详细的安装教程。 ## Python 使用方法 @@ -59,6 +59,11 @@ pip install adbc_driver_flightsql ```Python import adbc_driver_manager import adbc_driver_flightsql.dbapi as flight_sql + +>>> print(adbc_driver_manager.__version__) +1.1.0 +>>> print(adbc_driver_flightsql.__version__) +1.1.0 ``` ### 连接 Doris @@ -74,7 +79,7 @@ import adbc_driver_flightsql.dbapi as flight_sql 假设 Doris 实例中 FE 和 BE 的 Arrow Flight SQL 服务将分别在端口 9090 和 9091 上运行,且 Doris 用户名/密码为“user”/“pass”,那么连接过程如下所示: ```Python -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "user", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "pass", }) @@ -223,7 +228,7 @@ import adbc_driver_flightsql.dbapi as flight_sql # step 2, create a client that interacts with the Doris Arrow Flight SQL service. # Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090. # Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091. -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "root", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "", }) @@ -275,9 +280,9 @@ execute("select k5, sum(k1), count(1), avg(k3) from arrow_flight_sql_test group cursor.close() ``` -## Jdbc Connector with Arrow Flight SQL +## JDBC Connector with Arrow Flight SQL -Arrow Flight SQL 协议的开源 JDBC 驱动兼容标准的 JDBC API,可用于大多数 BI 工具通过 JDBC 访问 Doris,并支持高速传输 Apache Arrow 数据。使用方法与通过 MySQL 协议的 JDBC 驱动连接 Doris 类似,只需将链接 URL 中的 jdbc:mysql 协议换成 jdbc:arrow-flight-sql协议,查询返回的结果依然是 JDBC 的 ResultSet 数据结构。 +Arrow Flight SQL 协议的开源 JDBC 驱动兼容标准的 JDBC API,可用于大多数 BI 工具通过 JDBC 访问 Doris,并支持高速传输 Apache Arrow 数据。使用方法与通过 MySQL 协议的 JDBC 驱动连接 Doris 类似,只需将链接 URL 中的 jdbc:mysql 协议换成 jdbc:arrow-flight-sql 协议,查询返回的结果依然是 JDBC 的 ResultSet 数据结构。 POM dependency: ```Java @@ -301,7 +306,12 @@ $ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED - # Indirectly via environment variables $ env _JAVA_OPTIONS="--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ... ``` -否则,您可能会看到一些错误,如 `module java.base does not "opens java.nio" to unnamed module` 或者 `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` + +否则,您可能会看到一些错误,如 `module java.base does not "opens java.nio" to unnamed module` 或者 `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` 或者 `ava.lang.NoClassDefFoundError: Could not initialize class org.apache.arrow.memory.util.MemoryUtil (Internal; Prepare)` + +如果您在 IntelliJ IDEA 中调试,需要在 `Run/Debug Configurations` 的 `Build and run` 中增加 `--add-opens=java.base/java.nio=ALL-UNNAMED`,参照下面的图片: + + 连接代码示例如下: @@ -312,7 +322,7 @@ import java.sql.ResultSet; import java.sql.Statement; Class.forName("org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver"); -String DB_URL = "jdbc:arrow-flight-sql://0.0.0.0:9090?useServerPrepStmts=false" +String DB_URL = "jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}?useServerPrepStmts=false" + "&cachePrepStmts=true&useSSL=false&useEncryption=false"; String USER = "root"; String PASS = ""; @@ -377,7 +387,7 @@ POM dependency: final BufferAllocator allocator = new RootAllocator(); FlightSqlDriver driver = new FlightSqlDriver(allocator); Map<String, Object> parameters = new HashMap<>(); -AdbcDriver.PARAM_URI.set(parameters, Location.forGrpcInsecure("0.0.0.0", 9090).getUri().toString()); +AdbcDriver.PARAM_URI.set(parameters, Location.forGrpcInsecure("{FE_HOST}", {fe.conf:arrow_flight_sql_port}).getUri().toString()); AdbcDriver.PARAM_USERNAME.set(parameters, "root"); AdbcDriver.PARAM_PASSWORD.set(parameters, ""); AdbcDatabase adbcDatabase = driver.open(parameters); @@ -414,7 +424,7 @@ connection.close(); ```Java final Map<String, Object> parameters = new HashMap<>(); AdbcDriver.PARAM_URI.set( - parameters,"jdbc:arrow-flight-sql://0.0.0.0:9090?useServerPrepStmts=false&cachePrepStmts=true&useSSL=false&useEncryption=false"); + parameters,"jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}?useServerPrepStmts=false&cachePrepStmts=true&useSSL=false&useEncryption=false"); AdbcDriver.PARAM_USERNAME.set(parameters, "root"); AdbcDriver.PARAM_PASSWORD.set(parameters, ""); try ( @@ -479,4 +489,12 @@ kylinv10 SP2 和 SP3 的 Linux 内核版本最高只有 4.19.90-24.4.v2101.ky10. 4. ADBC v0.10,JDBC 和 Java ADBC/JDBCDriver 还不支持并行读取,没有实现`stmt.executePartitioned()`这个方法,只能使用原生的 FlightClient 实现并行读取多个 Endpoints, 使用方法`sqlClient=new FlightSqlClient, execute=sqlClient.execute(sql), endpoints=execute.getEndpoints(), for(FlightEndpoint endpoint: endpoints)`,此外,ADBC V0.10 默认的AdbcStatement实际是JdbcStatement,executeQuery后将行存格式的 JDBC ResultSet 又重新转成的Arrow列存格式,预期到 ADBC 1.0.0 时 Java ADBC 将功能完善 [GitHub Issue](https://github.com/apache/arrow-adbc/issues/1490)。 -5. 截止Arrow v15.0,Arrow JDBC Connector 不支持在 URL 中指定 database name,比如 `jdbc:arrow-flight-sql://0.0.0.0:9090/test?useServerPrepStmts=false` 中指定连接`test` database无效,只能手动执行SQL `use database`。 +5. 截止Arrow v15.0,Arrow JDBC Connector 不支持在 URL 中指定 database name,比如 `jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}/test?useServerPrepStmts=false` 中指定连接`test` database无效,只能手动执行SQL `use database`。 + +6. Doris 2.1.4 version 存在一个Bug,读取大数据量时有几率报错,在 [Fix arrow flight result sink #36827](https://github.com/apache/doris/pull/36827) 这个pr修复,升级 Doris 2.1.5 version 可以解决。问题详情见:[Questions](https://ask.selectdb.com/questions/D1Ia1/arrow-flight-sql-shi-yong-python-de-adbc-driver-lian-jie-doris-zhi-xing-cha-xun-sql-du-qu-bu-dao-shu-ju) + +7. `Warning: Cannot disable autocommit; conn will not be DB-API 2.0 compliant` 使用 Python 时忽略这个 Warning,这是 Python ADBC Client 的问题,这不会影响查询。 + +8. Python 报错 `grpc: received message larger than max (20748753 vs. 16777216)`,参考 [Python: grpc: received message larger than max (20748753 vs. 16777216) #2078](https://github.com/apache/arrow-adbc/issues/2078) 在 Database Option 中增加 `adbc_driver_flightsql.DatabaseOptions.WITH_MAX_MSG_SIZE.value`. + +9. Doris version 2.1.7 版本之前,报错 `Reach limit of connections`,这是因为没有限制单个用户的 Arrow Flight 连接数小于 `UserProperty` 中的 `max_user_connections`,默认100,可以通过 `SET PROPERTY FOR 'Billie' 'max_user_connections' = '1000';` 修改 Billie 用户的当前最大连接数到 100,或者在 `fe.conf` 中增加 `arrow_flight_token_cache_size=50` 来限制整体的 Arrow Flight 连接数。Doris version 2.1.7 版本之前 Arrow Flight 连接默认 3天 超时断开,只强制连接数小于 `qe_max_connection/2`,超过时依据lru淘汰,`qe_max_connection` 是fe所有用户的总连接数,默认1024。具体可以看 `arrow_flight_token_cache_size` 这个conf的介绍。在 [...] diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/db-connect/arrow-flight-sql-connect.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/db-connect/arrow-flight-sql-connect.md index 0615f7760c..e1091e9519 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/db-connect/arrow-flight-sql-connect.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/db-connect/arrow-flight-sql-connect.md @@ -59,6 +59,11 @@ pip install adbc_driver_flightsql ```Python import adbc_driver_manager import adbc_driver_flightsql.dbapi as flight_sql + +>>> print(adbc_driver_manager.__version__) +1.1.0 +>>> print(adbc_driver_flightsql.__version__) +1.1.0 ``` ### 连接 Doris @@ -74,7 +79,7 @@ import adbc_driver_flightsql.dbapi as flight_sql 假设 Doris 实例中 FE 和 BE 的 Arrow Flight SQL 服务将分别在端口 9090 和 9091 上运行,且 Doris 用户名/密码为“user”/“pass”,那么连接过程如下所示: ```Python -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "user", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "pass", }) @@ -223,7 +228,7 @@ import adbc_driver_flightsql.dbapi as flight_sql # step 2, create a client that interacts with the Doris Arrow Flight SQL service. # Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090. # Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091. -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "root", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "", }) @@ -301,7 +306,12 @@ $ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED - # Indirectly via environment variables $ env _JAVA_OPTIONS="--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ... ``` -否则,您可能会看到一些错误,如 `module java.base does not "opens java.nio" to unnamed module` 或者 `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` + +否则,您可能会看到一些错误,如 `module java.base does not "opens java.nio" to unnamed module` 或者 `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` 或者 `ava.lang.NoClassDefFoundError: Could not initialize class org.apache.arrow.memory.util.MemoryUtil (Internal; Prepare)` + +如果您在 IntelliJ IDEA 中调试,需要在 `Run/Debug Configurations` 的 `Build and run` 中增加 `--add-opens=java.base/java.nio=ALL-UNNAMED`,参照下面的图片: + + 连接代码示例如下: @@ -312,7 +322,7 @@ import java.sql.ResultSet; import java.sql.Statement; Class.forName("org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver"); -String DB_URL = "jdbc:arrow-flight-sql://0.0.0.0:9090?useServerPrepStmts=false" +String DB_URL = "jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}?useServerPrepStmts=false" + "&cachePrepStmts=true&useSSL=false&useEncryption=false"; String USER = "root"; String PASS = ""; @@ -377,7 +387,7 @@ POM dependency: final BufferAllocator allocator = new RootAllocator(); FlightSqlDriver driver = new FlightSqlDriver(allocator); Map<String, Object> parameters = new HashMap<>(); -AdbcDriver.PARAM_URI.set(parameters, Location.forGrpcInsecure("0.0.0.0", 9090).getUri().toString()); +AdbcDriver.PARAM_URI.set(parameters, Location.forGrpcInsecure("{FE_HOST}", {fe.conf:arrow_flight_sql_port}).getUri().toString()); AdbcDriver.PARAM_USERNAME.set(parameters, "root"); AdbcDriver.PARAM_PASSWORD.set(parameters, ""); AdbcDatabase adbcDatabase = driver.open(parameters); @@ -414,7 +424,7 @@ connection.close(); ```Java final Map<String, Object> parameters = new HashMap<>(); AdbcDriver.PARAM_URI.set( - parameters,"jdbc:arrow-flight-sql://0.0.0.0:9090?useServerPrepStmts=false&cachePrepStmts=true&useSSL=false&useEncryption=false"); + parameters,"jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}?useServerPrepStmts=false&cachePrepStmts=true&useSSL=false&useEncryption=false"); AdbcDriver.PARAM_USERNAME.set(parameters, "root"); AdbcDriver.PARAM_PASSWORD.set(parameters, ""); try ( @@ -479,4 +489,12 @@ kylinv10 SP2 和 SP3 的 Linux 内核版本最高只有 4.19.90-24.4.v2101.ky10. 4. ADBC v0.10,JDBC 和 Java ADBC/JDBCDriver 还不支持并行读取,没有实现`stmt.executePartitioned()`这个方法,只能使用原生的 FlightClient 实现并行读取多个 Endpoints, 使用方法`sqlClient=new FlightSqlClient, execute=sqlClient.execute(sql), endpoints=execute.getEndpoints(), for(FlightEndpoint endpoint: endpoints)`,此外,ADBC V0.10 默认的AdbcStatement实际是JdbcStatement,executeQuery后将行存格式的 JDBC ResultSet 又重新转成的Arrow列存格式,预期到 ADBC 1.0.0 时 Java ADBC 将功能完善 [GitHub Issue](https://github.com/apache/arrow-adbc/issues/1490)。 -5. 截止Arrow v15.0,Arrow JDBC Connector 不支持在 URL 中指定 database name,比如 `jdbc:arrow-flight-sql://0.0.0.0:9090/test?useServerPrepStmts=false` 中指定连接`test` database无效,只能手动执行SQL `use database`。 +5. 截止Arrow v15.0,Arrow JDBC Connector 不支持在 URL 中指定 database name,比如 `jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}/test?useServerPrepStmts=false` 中指定连接`test` database无效,只能手动执行SQL `use database`。 + +6. Doris 2.1.4 version 存在一个Bug,读取大数据量时有几率报错,在 [Fix arrow flight result sink #36827](https://github.com/apache/doris/pull/36827) 这个pr修复,升级 Doris 2.1.5 version 可以解决。问题详情见:[Questions](https://ask.selectdb.com/questions/D1Ia1/arrow-flight-sql-shi-yong-python-de-adbc-driver-lian-jie-doris-zhi-xing-cha-xun-sql-du-qu-bu-dao-shu-ju) + +7. `Warning: Cannot disable autocommit; conn will not be DB-API 2.0 compliant` 使用 Python 时忽略这个 Warning,这是 Python ADBC Client 的问题,这不会影响查询。 + +8. Python 报错 `grpc: received message larger than max (20748753 vs. 16777216)`,参考 [Python: grpc: received message larger than max (20748753 vs. 16777216) #2078](https://github.com/apache/arrow-adbc/issues/2078) 在 Database Option 中增加 `adbc_driver_flightsql.DatabaseOptions.WITH_MAX_MSG_SIZE.value`. + +9. Doris version 2.1.7 版本之前,报错 `Reach limit of connections`,这是因为没有限制单个用户的 Arrow Flight 连接数小于 `UserProperty` 中的 `max_user_connections`,默认100,可以通过 `SET PROPERTY FOR 'Billie' 'max_user_connections' = '1000';` 修改 Billie 用户的当前最大连接数到 100,或者在 `fe.conf` 中增加 `arrow_flight_token_cache_size=50` 来限制整体的 Arrow Flight 连接数。Doris version 2.1.7 版本之前 Arrow Flight 连接默认 3天 超时断开,只强制连接数小于 `qe_max_connection/2`,超过时依据lru淘汰,`qe_max_connection` 是fe所有用户的总连接数,默认1024。具体可以看 `arrow_flight_token_cache_size` 这个conf的介绍。在 [...] diff --git a/versioned_docs/version-2.1/db-connect/arrow-flight-sql-connect.md b/versioned_docs/version-2.1/db-connect/arrow-flight-sql-connect.md index 8f226fc9db..0c9e326f5f 100644 --- a/versioned_docs/version-2.1/db-connect/arrow-flight-sql-connect.md +++ b/versioned_docs/version-2.1/db-connect/arrow-flight-sql-connect.md @@ -58,6 +58,11 @@ Import the following modules/libraries in the code to use the installed Library: ```Python import adbc_driver_manager import adbc_driver_flightsql.dbapi as flight_sql + +>>> print(adbc_driver_manager.__version__) +1.1.0 +>>> print(adbc_driver_flightsql.__version__) +1.1.0 ``` ### Connect to Doris @@ -73,7 +78,7 @@ Modify the configuration parameters of Doris FE and BE: Assuming that the Arrow Flight SQL services of FE and BE in the Doris instance will run on ports 9090 and 9091 respectively, and the Doris username/password is "user"/"pass", the connection process is as follows: ```Python -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "user", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "pass", }) @@ -222,7 +227,7 @@ import adbc_driver_flightsql.dbapi as flight_sql # step 2, create a client that interacts with the Doris Arrow Flight SQL service. # Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090. # Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091. -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "root", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "", }) @@ -301,7 +306,7 @@ import java.sql.ResultSet; import java.sql.Statement; Class.forName("org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver"); -String DB_URL = "jdbc:arrow-flight-sql://0.0.0.0:9090?useServerPrepStmts=false" +String DB_URL = "jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}?useServerPrepStmts=false" + "&cachePrepStmts=true&useSSL=false&useEncryption=false"; String USER = "root"; String PASS = ""; @@ -349,6 +354,11 @@ POM dependency: <artifactId>adbc-sql</artifactId> <version>${adbc.version}</version> </dependency> + <dependency> + <groupId>org.apache.arrow.adbc</groupId> + <artifactId>adbc-driver-flight-sql</artifactId> + <version>${adbc.version}</version> + </dependency> </dependencies> ``` @@ -361,7 +371,7 @@ The connection code example is as follows: final BufferAllocator allocator = new RootAllocator(); FlightSqlDriver driver = new FlightSqlDriver(allocator); Map<String, Object> parameters = new HashMap<>(); -AdbcDriver.PARAM_URI.set(parameters, Location.forGrpcInsecure("0.0.0.0", 9090).getUri().toString()); +AdbcDriver.PARAM_URI.set(parameters, Location.forGrpcInsecure("{FE_HOST}", {fe.conf:arrow_flight_sql_port}).getUri().toString()); AdbcDriver.PARAM_USERNAME.set(parameters, "root"); AdbcDriver.PARAM_PASSWORD.set(parameters, ""); AdbcDatabase adbcDatabase = driver.open(parameters); @@ -402,14 +412,18 @@ $ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED - $ env _JAVA_OPTIONS="--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ... ``` -Otherwise, you may see errors like `module java.base does not "opens java.nio" to unnamed module` or `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` +Otherwise, you may see some errors such as `module java.base does not "opens java.nio" to unnamed module` or `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` or `ava.lang.NoClassDefFoundError: Could not initialize class org.apache.arrow.memory.util.MemoryUtil (Internal; Prepare)` + +If you debug in IntelliJ IDEA, you need to add `--add-opens=java.base/java.nio=ALL-UNNAMED` in `Build and run` of `Run/Debug Configurations`, refer to the picture below: + + The connection code example is as follows: ```Java final Map<String, Object> parameters = new HashMap<>(); AdbcDriver.PARAM_URI.set( - parameters,"jdbc:arrow-flight-sql://0.0.0.0:9090?useServerPrepStmts=false&cachePrepStmts=true&useSSL=false&useEncryption=false"); + parameters,"jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}?useServerPrepStmts=false&cachePrepStmts=true&useSSL=false&useEncryption=false"); AdbcDriver.PARAM_USERNAME.set(parameters, "root"); AdbcDriver.PARAM_PASSWORD.set(parameters, ""); try ( @@ -438,7 +452,7 @@ try ( } ``` -### Choice of JDBC and Java connection methods +### Choice of Jdbc and Java connection methods Compared with the traditional `jdbc:mysql` connection method, the performance test of the Arrow Flight SQL connection method of Jdbc and Java can be found at [GitHub Issue](https://github.com/apache/doris/issues/25514). Here are some usage suggestions based on the test conclusions. @@ -474,4 +488,12 @@ The Linux kernel version of kylinv10 SP2 and SP3 is only up to 4.19.90-24.4.v210 4. ADBC v0.10, JDBC and Java ADBC/JDBCDriver do not support parallel reading, and the `stmt.executePartitioned()` method is not implemented. You can only use the native FlightClient to implement parallel reading of multiple Endpoints, using the method `sqlClient=new FlightSqlClient, execute=sqlClient.execute(sql), endpoints=execute.getEndpoints(), for(FlightEndpoint endpoint: endpoints)`. In addition, the default AdbcStatement of ADBC V0.10 is actually JdbcStatement. After executeQue [...] -5. As of Arrow v15.0, Arrow JDBC Connector does not support specifying the database name in the URL. For example, `jdbc:arrow-flight-sql://0.0.0.0:9090/test?useServerPrepStmts=false` specifies that the connection to the `test` database is invalid. You can only execute the SQL `use database` manually. +5. As of Arrow v15.0, Arrow JDBC Connector does not support specifying the database name in the URL. For example, `jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}/test?useServerPrepStmts=false` specifies that the connection to the `test` database is invalid. You can only execute the SQL `use database` manually. + +6. There is a bug in Doris 2.1.4 version. There is a chance of error when reading large amounts of data. This bug is fixed in [Fix arrow flight result sink #36827](https://github.com/apache/doris/pull/36827) PR. Upgrading Doris 2.1.5 version can solve this problem. For details of the problem, see: [Questions](https://ask.selectdb.com/questions/D1Ia1/arrow-flight-sql-shi-yong-python-de-adbc-driver-lian-jie-doris-zhi-xing-cha-xun-sql-du-qu-bu-dao-shu-ju) + +7. `Warning: Cannot disable autocommit; conn will not be DB-API 2.0 compliant` Ignore this warning when using Python. This is a problem with the Python ADBC Client and will not affect the query. + +8. Python reports an error `grpc: received message larger than max (20748753 vs. 16777216)`. Refer to [Python: grpc: received message larger than max (20748753 vs. 16777216) #2078](https://github.com/apache/arrow-adbc/issues/2078) to add `adbc_driver_flightsql.DatabaseOptions.WITH_MAX_MSG_SIZE.value` in Database Option. + +9. Before Doris version 2.1.7, the error `Reach limit of connections` is reported. This is because there is no limit on the number of Arrow Flight connections for a single user, which is less than `max_user_connections` in `UserProperty`, which is 100 by default. You can modify the current maximum number of connections for Billie user to 100 by `SET PROPERTY FOR 'Billie' 'max_user_connections' = '1000';`, or add `arrow_flight_token_cache_size=50` in `fe.conf` to limit the overall number [...] diff --git a/versioned_docs/version-3.0/db-connect/arrow-flight-sql-connect.md b/versioned_docs/version-3.0/db-connect/arrow-flight-sql-connect.md index 8ca606ab65..0c9e326f5f 100644 --- a/versioned_docs/version-3.0/db-connect/arrow-flight-sql-connect.md +++ b/versioned_docs/version-3.0/db-connect/arrow-flight-sql-connect.md @@ -58,6 +58,11 @@ Import the following modules/libraries in the code to use the installed Library: ```Python import adbc_driver_manager import adbc_driver_flightsql.dbapi as flight_sql + +>>> print(adbc_driver_manager.__version__) +1.1.0 +>>> print(adbc_driver_flightsql.__version__) +1.1.0 ``` ### Connect to Doris @@ -73,7 +78,7 @@ Modify the configuration parameters of Doris FE and BE: Assuming that the Arrow Flight SQL services of FE and BE in the Doris instance will run on ports 9090 and 9091 respectively, and the Doris username/password is "user"/"pass", the connection process is as follows: ```Python -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "user", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "pass", }) @@ -222,7 +227,7 @@ import adbc_driver_flightsql.dbapi as flight_sql # step 2, create a client that interacts with the Doris Arrow Flight SQL service. # Modify arrow_flight_sql_port in fe/conf/fe.conf to an available port, such as 9090. # Modify arrow_flight_sql_port in be/conf/be.conf to an available port, such as 9091. -conn = flight_sql.connect(uri="grpc://127.0.0.1:9090", db_kwargs={ +conn = flight_sql.connect(uri="grpc://{FE_HOST}:{fe.conf:arrow_flight_sql_port}", db_kwargs={ adbc_driver_manager.DatabaseOptions.USERNAME.value: "root", adbc_driver_manager.DatabaseOptions.PASSWORD.value: "", }) @@ -301,7 +306,7 @@ import java.sql.ResultSet; import java.sql.Statement; Class.forName("org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver"); -String DB_URL = "jdbc:arrow-flight-sql://0.0.0.0:9090?useServerPrepStmts=false" +String DB_URL = "jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}?useServerPrepStmts=false" + "&cachePrepStmts=true&useSSL=false&useEncryption=false"; String USER = "root"; String PASS = ""; @@ -366,7 +371,7 @@ The connection code example is as follows: final BufferAllocator allocator = new RootAllocator(); FlightSqlDriver driver = new FlightSqlDriver(allocator); Map<String, Object> parameters = new HashMap<>(); -AdbcDriver.PARAM_URI.set(parameters, Location.forGrpcInsecure("0.0.0.0", 9090).getUri().toString()); +AdbcDriver.PARAM_URI.set(parameters, Location.forGrpcInsecure("{FE_HOST}", {fe.conf:arrow_flight_sql_port}).getUri().toString()); AdbcDriver.PARAM_USERNAME.set(parameters, "root"); AdbcDriver.PARAM_PASSWORD.set(parameters, ""); AdbcDatabase adbcDatabase = driver.open(parameters); @@ -407,14 +412,18 @@ $ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED - $ env _JAVA_OPTIONS="--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ... ``` -Otherwise, you may see errors like `module java.base does not "opens java.nio" to unnamed module` or `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` +Otherwise, you may see some errors such as `module java.base does not "opens java.nio" to unnamed module` or `module java.base does not "opens java.nio" to org.apache.arrow.memory.core` or `ava.lang.NoClassDefFoundError: Could not initialize class org.apache.arrow.memory.util.MemoryUtil (Internal; Prepare)` + +If you debug in IntelliJ IDEA, you need to add `--add-opens=java.base/java.nio=ALL-UNNAMED` in `Build and run` of `Run/Debug Configurations`, refer to the picture below: + + The connection code example is as follows: ```Java final Map<String, Object> parameters = new HashMap<>(); AdbcDriver.PARAM_URI.set( - parameters,"jdbc:arrow-flight-sql://0.0.0.0:9090?useServerPrepStmts=false&cachePrepStmts=true&useSSL=false&useEncryption=false"); + parameters,"jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}?useServerPrepStmts=false&cachePrepStmts=true&useSSL=false&useEncryption=false"); AdbcDriver.PARAM_USERNAME.set(parameters, "root"); AdbcDriver.PARAM_PASSWORD.set(parameters, ""); try ( @@ -479,4 +488,12 @@ The Linux kernel version of kylinv10 SP2 and SP3 is only up to 4.19.90-24.4.v210 4. ADBC v0.10, JDBC and Java ADBC/JDBCDriver do not support parallel reading, and the `stmt.executePartitioned()` method is not implemented. You can only use the native FlightClient to implement parallel reading of multiple Endpoints, using the method `sqlClient=new FlightSqlClient, execute=sqlClient.execute(sql), endpoints=execute.getEndpoints(), for(FlightEndpoint endpoint: endpoints)`. In addition, the default AdbcStatement of ADBC V0.10 is actually JdbcStatement. After executeQue [...] -5. As of Arrow v15.0, Arrow JDBC Connector does not support specifying the database name in the URL. For example, `jdbc:arrow-flight-sql://0.0.0.0:9090/test?useServerPrepStmts=false` specifies that the connection to the `test` database is invalid. You can only execute the SQL `use database` manually. +5. As of Arrow v15.0, Arrow JDBC Connector does not support specifying the database name in the URL. For example, `jdbc:arrow-flight-sql://{FE_HOST}:{fe.conf:arrow_flight_sql_port}/test?useServerPrepStmts=false` specifies that the connection to the `test` database is invalid. You can only execute the SQL `use database` manually. + +6. There is a bug in Doris 2.1.4 version. There is a chance of error when reading large amounts of data. This bug is fixed in [Fix arrow flight result sink #36827](https://github.com/apache/doris/pull/36827) PR. Upgrading Doris 2.1.5 version can solve this problem. For details of the problem, see: [Questions](https://ask.selectdb.com/questions/D1Ia1/arrow-flight-sql-shi-yong-python-de-adbc-driver-lian-jie-doris-zhi-xing-cha-xun-sql-du-qu-bu-dao-shu-ju) + +7. `Warning: Cannot disable autocommit; conn will not be DB-API 2.0 compliant` Ignore this warning when using Python. This is a problem with the Python ADBC Client and will not affect the query. + +8. Python reports an error `grpc: received message larger than max (20748753 vs. 16777216)`. Refer to [Python: grpc: received message larger than max (20748753 vs. 16777216) #2078](https://github.com/apache/arrow-adbc/issues/2078) to add `adbc_driver_flightsql.DatabaseOptions.WITH_MAX_MSG_SIZE.value` in Database Option. + +9. Before Doris version 2.1.7, the error `Reach limit of connections` is reported. This is because there is no limit on the number of Arrow Flight connections for a single user, which is less than `max_user_connections` in `UserProperty`, which is 100 by default. You can modify the current maximum number of connections for Billie user to 100 by `SET PROPERTY FOR 'Billie' 'max_user_connections' = '1000';`, or add `arrow_flight_token_cache_size=50` in `fe.conf` to limit the overall number [...] --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org