adriangb commented on PR #21931: URL: https://github.com/apache/datafusion/pull/21931#issuecomment-4356288696
> I'm quite in favor of this change. It also avoids blowing up the expression based on number of partitions, which can happen when partition count is high. Well it doesn't avoid it completely, and in some ways it makes it worse. We still have 1 hash map per partition (cannot be avoided unless we pay the memory and build time cost of combining them). And we now scale our probes with the number of partitions, they used to be constant with number of partitions. But probes are much faster than hashes which is why I think unless the partition count is high this will likely be faster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
