gabotechs commented on PR #22010: URL: https://github.com/apache/datafusion/pull/22010#issuecomment-4386367667
> @gabotechs I notice peak memory is quite a bit higher here? I can imagine how this can happen in cases where the fanout is very big, it boils down to the gating mechanism implemented in `RepartitionExec` today: https://github.com/apache/datafusion/blob/dcf648255b92a34798871139aeba12d95f8f3032/datafusion/physical-plan/src/repartition/distributor_channels.rs#L21-L36 Before this PR, the batches that were flowing through there where of size `batch_size / output_partitions`, but with this PR, they are of size `batch_size`. The memory reporting there seems quite unstable though, for example, this other runs show the same peak memory usage: - https://github.com/apache/datafusion/pull/22010#issuecomment-4374065644 - https://github.com/apache/datafusion/pull/22010#issuecomment-4374088542 - https://github.com/apache/datafusion/pull/22010#issuecomment-4374117175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
