Youngwb opened a new pull request #3186: Support convert  Arrow data to 
RowBatch asynchronously in Spark-Doris-Connector
URL: https://github.com/apache/incubator-doris/pull/3186
 
 
   Currently, in the Spark-Doris-Connector, when Spark iteratively obtains each 
row of data, it needs to synchronously convert the Arrow format data into the 
row format required by Spark. In order to speed up the conversion process, we 
can add an asynchronous thread in the Connector, which is responsible for 
obtaining the Arrow format data from BE and converting it into the row format 
required by Spark calculation
   
   In our test environment, Doris cluster used 1 fe and 7 be (32C+128G). When 
using Spark-Doris-Connector to query a table containing 67 columns, the 
original query returned 69 million rows of data took about 2.5min, but after 
improvement, it reduced to about 1.6min, which reduced the time by about 30%

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to