stuhood commented on code in PR #21807:
URL: https://github.com/apache/datafusion/pull/21807#discussion_r3140190591
##########
datafusion/physical-expr-common/src/physical_expr.rs:
##########
@@ -438,6 +441,23 @@ pub trait PhysicalExpr: Any + Send + Sync + Display +
Debug + DynEq + DynHash {
fn placement(&self) -> ExpressionPlacement {
ExpressionPlacement::KeepInPlace
}
+
+ /// Return a stable, globally-unique identifier for this [`PhysicalExpr`],
if it
+ /// has one.
+ ///
+ /// This identifier tracks which expressions which are connected (e.g.
`DynamicFilterPhysicalExpr`
Review Comment:
```suggestion
/// This identifier tracks expressions which are connected (e.g.
`DynamicFilterPhysicalExpr`
```
##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -65,6 +65,10 @@ pub struct DynamicFilterPhysicalExpr {
/// If any of the children were remapped / modified (e.g. to adjust for
projections) we need to keep track of the new children
/// so that when we update `current()` in subsequent iterations we can
re-apply the replacements.
remapped_children: Option<Vec<Arc<dyn PhysicalExpr>>>,
+ /// Unique identifier for this dynamic filter.
+ ///
+ /// Derived filters (ex. via `with_new_children`) should inherit the
expression id of the source filter.
+ expression_id: u64,
/// The source of dynamic filters.
inner: Arc<RwLock<Inner>>,
Review Comment:
It would be nice if it were clearer exactly how the `Inner` relates to the
`expression_id`.
With (mostly) an outsider's perspective, it seems like the `expression_id`
in this case should actually literally be an id for the `inner` state? So as
with https://github.com/apache/datafusion/issues/21650 , it feels like this is
more like a `mutable_state_id`, perhaps?
Because you might have two different `DynamicFilterPhysicalExpr` (different
wrapping expressions) wrapped around the same inner state... and at that point,
the `expression_id` is not an "id for the expression".
##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -76,16 +80,40 @@ pub struct DynamicFilterPhysicalExpr {
nullable: Arc<RwLock<Option<bool>>>,
}
-#[derive(Debug)]
-struct Inner {
+/// Atomic internal state of a [`DynamicFilterPhysicalExpr`].
+///
+/// **Warning:** exposed publicly solely so that proto (de)serialization in
+/// `datafusion-proto` can read and rebuild this state. Do not treat this type
+/// or its layout as a stable API.
+#[derive(Debug, Clone)]
+pub struct Inner {
/// A counter that gets incremented every time the expression is updated
so that we can track changes cheaply.
/// This is used for [`PhysicalExpr::snapshot_generation`] to have a cheap
check for changes.
- generation: u64,
- expr: Arc<dyn PhysicalExpr>,
+ pub generation: u64,
+ pub expr: Arc<dyn PhysicalExpr>,
/// Flag for quick synchronous check if filter is complete.
/// This is redundant with the watch channel state, but allows us to
return immediately
/// from `wait_complete()` without subscribing if already complete.
- is_complete: bool,
+ pub is_complete: bool,
+}
+
+// TODO: Include expression_id in debug output.
Review Comment:
Ideally even in the `DisplayAs` representation for plans, I would think.
It's actually possibly the _most_ important thing to have in the plan when
rendering a dynamic filter?
##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -346,6 +375,75 @@ impl DynamicFilterPhysicalExpr {
write!(f, " ]")
}
+
+ /// Generate a new expression id for this filter.
+ fn new_expression_id() -> u64 {
+ random::<u64>()
+ }
+
+ /// Return the filter's original children (before any remapping).
+ ///
+ /// **Warning:** intended only for `datafusion-proto` (de)serialization.
+ /// Not a stable API.
+ pub fn original_children(&self) -> &[Arc<dyn PhysicalExpr>] {
+ &self.children
+ }
+
+ /// Return the filter's remapped children, if any have been set via
+ /// [`PhysicalExpr::with_new_children`].
+ ///
+ /// **Warning:** intended only for `datafusion-proto` (de)serialization.
+ /// Not a stable API.
+ pub fn remapped_children(&self) -> Option<&[Arc<dyn PhysicalExpr>]> {
+ self.remapped_children.as_deref()
+ }
+
+ /// Rebuild a `DynamicFilterPhysicalExpr` from its stored parts. Used by
+ /// proto deserialization to preserve `expression_id` across a roundtrip
+ /// rather than minting a fresh one.
+ ///
+ /// **Warning:** intended only for `datafusion-proto` (de)serialization.
+ /// Not a stable API.
Review Comment:
Could this be generic via the [shared
state](https://github.com/apache/datafusion/issues/21650) proposal? Not sure
what is up with that. Ditto `fn inner`.
##########
datafusion/physical-expr/Cargo.toml:
##########
@@ -55,6 +55,7 @@ indexmap = { workspace = true }
itertools = { workspace = true, features = ["use_std"] }
parking_lot = { workspace = true }
petgraph = "0.8.3"
+rand = { workspace = true }
Review Comment:
Not sure what the conventions are, but: introducing random order to the
`physical-expr` crate _seems_ like a big deal? Is this being used to generate
something that could be deterministic from some sort of context instead?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]