WeCodingNow opened a new issue, #931: URL: https://github.com/apache/arrow-java/issues/931
### Describe the enhancement requested Arrow Flight SQL has a feature for ingesting massive datasets, Bulk Ingestion: https://github.com/apache/arrow/issues/38255 It would be beneficial to use those special RPC methods for batched prepared statement calls when the prepared statement is strictly for inserting data. E.g. when Spark is used for writing data, it generates a simple SQL query like "INSERT INTO table(field1, field2, ...) VALUES (?, ?, ...)", creates a prepared statement, and then uses the prepared statement update RPC method to insert the rows of the dataset. If this feature is implemented, it would be possible for the driver to instead use the `DoPut(CommandStatementIngest)` instead. There are Arrow Flight SQL server implementations that work like this: when a `DoAction(ActionCreatePreparedStatementRequest)` is executed, the server creates up to two version of the prepared statement underlying data structure. One is a handle to a full-scale query engine execution procedure (e.g. DataFusion's logical plan), and another is a handle to a very simple procedure that just stores the received record batches in the storage - of course, the second procedure is only possible to be generated when the query is of a certain form; like the one used by Spark. Instead, I think that it should be possible to move this parsing-then-deciding-to-use-optimized-version-of-the-procedure logic into the client. Usecase for this integration is this: developers of Arrow Flight SQL servers could implement bulk ingestion command handlers and avoid implementing special logic for handling batched inserts in a special manner. Then the client would use this newly introduced driver option to allow the driver to decide to use the bulk ingestion RPC methods for inserting data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
