Youngwb opened a new pull request #3186: Support convert Arrow data to RowBatch asynchronously in Spark-Doris-Connector URL: https://github.com/apache/incubator-doris/pull/3186 Currently, in the Spark-Doris-Connector, when Spark iteratively obtains each row of data, it needs to synchronously convert the Arrow format data into the row format required by Spark. In order to speed up the conversion process, we can add an asynchronous thread in the Connector, which is responsible for obtaining the Arrow format data from BE and converting it into the row format required by Spark calculation In our test environment, Doris cluster used 1 fe and 7 be (32C+128G). When using Spark-Doris-Connector to query a table containing 67 columns, the original query returned 69 million rows of data took about 2.5min, but after improvement, it reduced to about 1.6min, which reduced the time by about 30%
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org