NGA-TRAN commented on PR #21832: URL: https://github.com/apache/datafusion/pull/21832#issuecomment-4315190423
@adriangb: Thanks for the proposal to add native range partitioning support to DataFusion. That’s one of the solutions Gene suggested. However, from our recent sync with @alamb and the ParadeDB team, Andrew recommended simply mapping the partition index of the build side and probe side — which is what this PR implements. The extra code is needed because we only apply this for range partitioning (not hash), and only when partition preservation is enabled and no repartitioning occurs beforehand, which aligns with the optimizer rules. This approach benefits all types of range partitions. Gene has a detailed analysis here: https://github.com/apache/datafusion/issues/21207#issuecomment-4254968115 @alamb: If the extended solution isn’t what you had in mind, what would you suggest? Should we move toward native range partitioning support as Adrian proposed, or is the current PR approach sufficient for now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
