gene-bordegaray commented on code in PR #22207:
URL: https://github.com/apache/datafusion/pull/22207#discussion_r3249849428


##########
datafusion/physical-expr/src/partitioning.rs:
##########
@@ -133,13 +135,94 @@ impl Display for Partitioning {
                     .join(", ");
                 write!(f, "Hash([{phy_exprs_str}], {size})")
             }
+            Partitioning::Expr(expr) => write!(f, "{expr}"),
             Partitioning::UnknownPartitioning(size) => {
                 write!(f, "UnknownPartitioning({size})")
             }
         }
     }
 }
 
+/// Physical expression partitioning.
+///
+/// Partition `i` contains rows where `partition_exprs[i]` evaluates to true.
+/// The source declaring partitioning is responsible for ensuring that, for 
every
+/// row emitted, exactly one partition expression evaluates to true and that 
row
+/// is emitted by the corresponding partition. The expressions do not need to
+/// cover values that the plan cannot emit.
+///
+/// For example, a scan that can only emit rows for `2022` can declare two date
+/// partitions as:
+///
+/// ```text
+/// partition_exprs[0] = date >= 2022-01-01 AND date < 2022-07-01
+/// partition_exprs[1] = date >= 2022-07-01 AND date < 2023-01-01
+/// ```
+///
+/// This is valid even though values outside `2022` are not covered, as long as
+/// the source does not emit rows outside those ranges. It would not be valid
+/// for this plan to emit a row from `partition[i]` whose date is not within
+/// `partition_exprs[i]`, or to emit a row whose date matches multiple
+/// partition expressions.
+///
+/// More complex partitioning can be represented using normal expression
+/// composition. For example, one partition in a date and city range can be
+/// represented as `date >= 2021-01-01 AND date < 2022-07-01 AND city < 
'Boston'`.
+///
+/// NOTE: Optimizer and execution behavior for this partitioning is 
intentionally
+/// not implemented and will be introduced incrementally.
+#[derive(Debug, Clone)]
+pub struct ExprPartitioning {

Review Comment:
   The reason is that we aim to be more flexible here. This can support Range 
partitioning but also extens beyond that to any physical expr the source wants 
to provide. I just gave range in the description as one concrete example of how 
this could be used.
   
   Someone could partition using this scheme on something like city column 
where:
   
   ```text
   partition 1 -> city = "New York"
   partition 2 -> city = "London"
   ```
   
   and so on.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to