DachuanXUAN opened a new issue, #45247: URL: https://github.com/apache/doris/issues/45247
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version 2.1.5 ### What's Wrong? SQL 为简单的 select into s3 SELECT file_id, cast(data_start_time as String ) as data_start_time, cast(data_end_time as String ) as data_end_time, device_type... FROM t_table_name where data_start_time >= '2024-11-28 00:00:00.000' and data_start_time < '2024-11-28 01:00:00.000' order by file_id,data_start_time INTO OUTFILE "s3://xxx/xxx/2024_11_28_00/part_v2_" FORMAT AS PARQUET PROPERTIES( "s3.endpoint" = "http://xxx.com/", "s3.access_key" = "xxx", "s3.secret_key"="xxx", "s3.region" = "xxx", "max_file_size" = "120MB" ); batch_size 设置为 10 万时 set batch_size=100000; +------------+-----------+-----------+------------------------------------------------------------------------------------------------+ | FileNumber | TotalRows | FileSize | URL | +------------+-----------+-----------+------------------------------------------------------------------------------------------------+ | 5 | 14602938 | 671738439 | s3://xxx/2024_11_28_00/part_v2_66d16409bc2a4b37-9905759434b51248_* | +------------+-----------+-----------+------------------------------------------------------------------------------------------------+ batch_size 设置为默认值 set batch_size=4096; +------------+-----------+------------+------------------------------------------------------------------------------------------------+ | FileNumber | TotalRows | FileSize | URL | +------------+-----------+------------+------------------------------------------------------------------------------------------------+ | 10 | 29803106 | 1316012670 | s3://xxx/2024_11_28_00/part_v2_2cef1e9dba2a4749-a9a688a90843fa53_* | +------------+-----------+------------+------------------------------------------------------------------------------------------------+ batch_size 为默认值的行数应该是对的,batch_size 比较大的情况下,就会少数。batch_size 如果更大,SQL 会卡死,看不出原因。 之所以要设置 batch_size 是因为导出 parquet 时,希望能够设置 block 的行数,减少 parquet 中 block 的数量。因为除了设置 batch_size 没有别的方法能够控制这个数量。 ### What You Expected? skip ### How to Reproduce? _No response_ ### Anything Else? _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org