gene-bordegaray commented on code in PR #23184:
URL: https://github.com/apache/datafusion/pull/23184#discussion_r3474474416


##########
datafusion/physical-optimizer/src/ensure_requirements/enforce_distribution.rs:
##########
@@ -1058,6 +1099,59 @@ fn get_repartition_requirement_status(
         .collect())
 }
 
+/// Returns distribution state for a partitioned join's children.
+///
+/// This is optimizer policy: partitioned joins require children that can be
+/// paired by partition index. Inner hash joins can reuse compatible range
+/// partitioning; otherwise the existing hash repartitioning policy applies.
+fn partitioned_join_distribution(

Review Comment:
   I noticed that simple checking logic is slightly duplicated here, 
[here](https://github.com/apache/datafusion/pull/23184/changes#diff-2aeec4e62bd0ac4a04cc7fdf726457955f2417a0e745c9014b96fe51d6d7f897R188),
 and 
[here](https://github.com/apache/datafusion/pull/23184/changes#diff-40a284c689424c410dd364bfd40cb8181ce85fa14d6f5cfbaa36eb378d10302bR915)
   
   There may be a good way to extract this out but didnt want to premptively do 
a public change on speculation but something I am noting. Let me know if any 
one has suggestsion



##########
datafusion/physical-expr/src/partitioning.rs:
##########
@@ -406,37 +391,56 @@ impl Partitioning {
         }
     }
 
-    /// Returns true when `self` and `other` describe compatible partition 
maps.
+    /// Returns true when two partitionings both satisfy their own distribution
+    /// requirements and can be paired by partition index.
+    ///
+    /// Use this for multi-input operators, such as partitioned joins, where
+    /// each child has a different schema, required [`Distribution`], and
+    /// expression-equivalence context.
+    ///
+    /// ```text
+    /// # co-partitioned: each side satisfies its own requirement, and 
boundaries match
+    /// left:  Range(left.a ASC,  [10, 20]), required KeyPartitioned(left.a)
+    /// right: Range(right.x ASC, [10, 20]), required KeyPartitioned(right.x)
     ///
-    /// Compatible partition maps can be used for partition-local behavior: if
-    /// this returns true, partition `i` from both partitionings can be treated
-    /// as covering the same partition domain. This is stricter than
-    /// [`Self::satisfaction`], which only answers whether this partitioning 
can
-    /// satisfy a required distribution.
-    pub fn compatible_with(
+    /// # not compatible: right side does not satisfy a hash-specific 
requirement
+    /// left:  Range(left.a ASC,  [10, 20]), required KeyPartitioned(left.a)
+    /// right: Range(right.x ASC, [10, 20]), required HashPartitioned(right.x)
+    ///
+    /// # not compatible: boundaries differ
+    /// left:  Range(left.a ASC,  [10, 20]), required KeyPartitioned(left.a)
+    /// right: Range(right.x ASC, [15, 20]), required KeyPartitioned(right.x)
+    /// ```
+    pub fn co_partitioned_with(

Review Comment:
   I realized the concept was a bit off. I think this is ok if we haven't had a 
release... 😅 
   
   I linked the mirroring concepts that are used in trino in spark in the PR 
description
   
   Lesson learned to have consumer of the public API first



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to