adriangb opened a new pull request, #21966: URL: https://github.com/apache/datafusion/pull/21966
## Which issue does this PR close? Related to <https://github.com/apache/datafusion/issues/21835> (no `Closes` — this PR only does the `PhysicalExpr` half and only registers two built-ins; full closure is multi-PR). ## Rationale for this change DataFusion has no in-tree serialization on the `PhysicalExpr` trait. The proto crate handles serialization with a downcast chain inside `datafusion-proto/src/physical_plan/{from_proto,to_proto}.rs`, which forces every expression author to expose private state via `pub` accessors so the proto crate can poke at it (#21835). PR #21929's draft branch added a `PhysicalExpr::to_proto` hook to fix that — but the hook is protobuf-specific. If we ever want JSON, bincode, or anything else, we'd need a parallel `to_json`, `to_bincode`, etc., one trait method per format. This PR introduces a single `serde`-style hook on `PhysicalExpr` that works with any `serde` format (JSON for debugging, bincode for binary, a serde adapter on top of protobuf, etc.). The shape is intentionally minimal so it can be reviewed in isolation: - `PhysicalExpr::serde_tag(&self) -> &'static str` — default `""` means "not serializable". - `PhysicalExpr::erased_serialize(&self) -> Box<dyn erased_serde::Serialize + '_>` — default returns an error sentinel. - `PhysicalExprDeserialize` trait + `PhysicalExprRegistry` builder + `DeserializeContext` over `&mut dyn erased_serde::Deserializer<'de>`. The deserialization side runs through any `serde::Deserializer` (`Registry::deserialize`) with `deserialize_json` as a convenience entry point. Trait-object children recurse through `registry.expr_seed()` inside hand-rolled visitors. Wire stability across DataFusion versions is **not** a goal here — the `datafusion-proto` crate stays the path for that. The serde hook is for in-process tooling, debugging, and intra-cluster ephemeral serialization. A future PR can write a serde adapter on top of prost so proto becomes another consumer of `erased_serialize`, at which point the proto crate's downcast chain can collapse. ## What changes are included in this PR? - **Trait additions on `PhysicalExpr`** (`physical-expr-common`): `serde_tag` and `erased_serialize`, both with default impls so this is non-breaking for existing in-tree and out-of-tree implementers. - **`impl Serialize for dyn PhysicalExpr`** producing a `{tag, data}` envelope. `Arc<dyn PhysicalExpr>` is automatically `Serialize` via serde's `rc` feature. - **`PhysicalExprDeserialize` trait + `PhysicalExprRegistry`** (`physical-expr-common`). Builder-style (`empty().with::<T>().with::<U>()`), modeled on the optimizer's rule list. No globals, no `inventory` crate (which has wasm/FFI gotchas DataFusion cares about). - `with::<T>()` panics on duplicate tag; `try_with::<T>()` returns `Result` for the runtime-plugin case. - **Two built-ins wired up**: `Column` (leaf) and `NotExpr` (single trait-object child, exercises the `DeserializeContext::deserialize_unary` helper). - **Runnable example** at `datafusion-examples/examples/physical_expr_serde` — defines a custom `PhysicalExpr` with a trait-object child, registers it on top of `default_registry()`, and round-trips through JSON. - **Library-user-guide page** walking through the trait surface, child handling, tag uniqueness, format choice, and how this differs from `datafusion-proto`. The remaining built-ins follow the same shape and will land in follow-up PRs in batches grouped by what they need from the rest of the codebase (`Operator` first, then `ScalarValue`/`arrow::DataType` for `Literal`/`Cast*`/`Like`/`InList`/`Case`). ## Are these changes tested? Yes: - 6 unit tests in `datafusion-physical-expr` exercise: Column round-trip, NotExpr round-trip, NOT(NOT(a)) nested round-trip, unknown-tag error path, `try_with` duplicate-tag error, `with` duplicate-tag panic. - The example is runnable end-to-end (`cargo run --example physical_expr_serde`) and prints the JSON, the round-tripped expression, and the expected error when the registry doesn't know about the custom tag. - `cargo check --workspace` and `cargo clippy --all-targets --all-features -- -D warnings` (on touched crates) clean. ## Are there any user-facing changes? Yes, but additive: - Two new optional methods on the `PhysicalExpr` trait, both with defaults. Existing implementers (in-tree and out-of-tree) compile unchanged; they just won't be serializable through the new path until they opt in. - New public types: `PhysicalExprRegistry`, `PhysicalExprDeserialize`, `DeserializeContext`, `ExprSeed`, plus `default_registry()` in `datafusion-physical-expr`. - New library-user-guide page documenting the surface. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
