zhuqi-lucas opened a new issue, #22732:
URL: https://github.com/apache/datafusion/issues/22732
## Describe the bug / opportunity
`LogicalPlan` is 320 bytes on the stack today, but the typical
query-execution path never produces the variants that drive that size. The
`Ddl(DdlStatement)` variant is the offender: it carries `CreateExternalTable`
(312 bytes) and `CreateFunction` (288 bytes), and the enum-size rule
(`max(variant) + tag`) forces the whole `LogicalPlan` enum to the same width on
every code path — including SELECT queries that will never instantiate a DDL
node.
This shows up directly on the planning hot path. Profiling `sql_planner`
(samply, `logical_plan_tpch_all`) on macOS aarch64:
```
55% in sql_planner binary (DataFusion + Rust stdlib)
31% libsystem_malloc.dylib (malloc / free / realloc)
13% libsystem_platform.dylib (memcpy / memmove)
1% other (kernel, dyld, pthread)
```
A non-trivial share of the 13% memcpy/memmove time is `LogicalPlan` moves:
every `std::mem::take` in the optimizer's in-place rewriters, every owned-API
`LogicalPlan::map_*`, every `Arc<LogicalPlan>` write currently shuffles 320
bytes, even when the loaded variant is something small like `Projection` (40
bytes) or `Filter` (128 bytes).
### Per-variant sizes
```
=== LogicalPlan enum total ===
320 bytes LogicalPlan
=== Per-variant inner struct ===
40 bytes Projection
128 bytes Filter
40 bytes Window
64 bytes Aggregate
48 bytes Sort
176 bytes Join
40 bytes Repartition
32 bytes Union
56 bytes Subquery
72 bytes SubqueryAlias
24 bytes Limit
88 bytes Distinct
16 bytes Extension
56 bytes RecursiveQuery
48 bytes Analyze
48 bytes Explain
168 bytes TableScan
32 bytes Values
144 bytes Unnest
96 bytes DmlStatement
120 bytes CreateMemoryTable
96 bytes CreateView
88 bytes DistinctOn
56 bytes Statement
320 bytes DdlStatement <-- forces LogicalPlan to 320
16 bytes EmptyRelation
16 bytes DescribeTable
=== Inside DdlStatement ===
312 bytes CreateExternalTable <-- dominates DdlStatement
288 bytes CreateFunction <-- second-largest
144 bytes CreateIndex
72 bytes DropTable / DropView
48 bytes DropCatalogSchema
40 bytes CreateCatalog / CreateCatalogSchema / DropFunction
```
If `CreateExternalTable` and `CreateFunction` are `Box`ed inside
`DdlStatement`, the max DDL variant drops to `CreateIndex` at 144 bytes, the
max `LogicalPlan` variant becomes `Join` at 176, and `LogicalPlan` shrinks to
**176 bytes (–45%)** — the enum discriminant fits inside `Join`'s alignment
padding, so `LogicalPlan` ends up the same width as `Join` itself. Paid for by
one heap allocation per DDL plan, which is negligible because DDL plans are not
on the per-query hot path.
## To Reproduce
```rust
// in datafusion/expr, with all relevant types in scope:
println!("{}", std::mem::size_of::<LogicalPlan>()); // 320
println!("{}", std::mem::size_of::<DdlStatement>()); // 320
println!("{}", std::mem::size_of::<CreateExternalTable>()); // 312
println!("{}", std::mem::size_of::<CreateFunction>()); // 288
```
## Expected behavior
`LogicalPlan` should not be sized by variants that never appear on the query
path. Moving the two outsized DDL variants behind a `Box` brings `LogicalPlan`
to a size driven by `Join` (176 bytes), which is paid by every plan node on
every query.
## Additional context
Local `cargo bench -p datafusion --bench sql_planner --quick` on macOS
aarch64, comparing main vs. boxed DDL variants:
| bench | main | boxed | delta |
|---|---|---|---|
| `optimizer_tpch_all` | 8.61 ms | 8.18 ms | **–5.0%** |
| `optimizer_tpcds_all` | 168.0 ms | 163.5 ms | **–2.7%** |
Smaller benches (sub-200 µs) are within `--quick` noise.
CI bench on the GKE aarch64 runner should give a tighter signal; willing to
open a draft PR so a maintainer can trigger it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]