yonatan-sevenai opened a new issue, #21490:
URL: https://github.com/apache/datafusion/issues/21490
### Describe the bug
`plan_to_sql` generates invalid SQL when an outer Projection excludes a
column that the Sort references by alias. The alias is defined in an inner
Projection that the Unparser flattens away.
Given this logical plan:
```
Projection: [a, b] -- outer: excludes sort column
"c"
Sort: [c DESC, fetch=1] -- references alias "c"
Projection: [X AS a, Y AS b, Z AS c] -- defines alias "c" →
physical column "Z"
SubqueryAlias: t
TableScan: phys_table [X, Y, Z]
```
The Unparser produces:
```sql
SELECT t."X" AS a, t."Y" AS b FROM phys_table AS t ORDER BY c DESC NULLS
FIRST LIMIT 1
```
Column `c` does not exist — it was defined as `Z AS c` in the inner
Projection, but that definition was dropped during plan rewriting.
### Root cause
The function `rewrite_plan_for_sort_on_non_projected_fields` in
`datafusion/sql/src/unparser/rewrite.rs` detects that the outer Projection's
columns plus the Sort's columns form the same set as the inner Projection's
columns. It then flattens the two Projections
into one by replacing the inner Projection's expressions with the outer
Projection's mapped expressions (line 257:
`inner_p.expr.clone_from(&new_exprs)`).
The set comparison succeeds because:
- Inner Projection aliases are collapsed to their alias names: `{a, b, c}`
- Outer Projection `{a, b}` + Sort column `{c}` = `{a, b, c}`
However, `new_exprs` is built from only the outer Projection's expressions
(`[X AS a, Y AS b]`). The sort column's alias definition (`Z AS c`) is not
included. The rewritten plan becomes:
```
Sort: [c DESC, fetch=1] -- still references "c"
Projection: [X AS a, Y AS b] -- "Z AS c" is gone
SubqueryAlias: t
TableScan: phys_table
```
The Sort's reference to `c` is now dangling.
### To Reproduce
1. The **outer Projection must exclude the Sort column** — if the outer
Projection includes `c`, the alias is preserved in `new_exprs` and the bug does
not manifest.
2. The Sort column must be defined as an **alias** in the inner Projection
(not a plain column passthrough).
3. A **SubqueryAlias** (or similar) node below the inner Projection is
necessary to create the two-Projection shape that triggers the rewrite.
### Expected behavior
The generated SQL should either:
- Inline the physical column into ORDER BY: `ORDER BY t."Z" DESC NULLS
FIRST`
- Or preserve the subquery boundary so the alias remains in scope
### Additional context
The following test shows the bug in action
```rust
#[test]
fn test_sort_on_aliased_column_dropped_by_outer_projection() -> Result<()>
{
let schema = Schema::new(vec![
Field::new("X", DataType::Utf8, true),
Field::new("Y", DataType::Utf8, true),
Field::new("Z", DataType::Utf8, true),
]);
let plan = table_scan(Some("phys_table"), &schema, None)?
.alias("t")?
.project(vec![
Expr::Column(Column::new(Some(TableReference::bare("t")),
"X")).alias("a"),
Expr::Column(Column::new(Some(TableReference::bare("t")),
"Y")).alias("b"),
Expr::Column(Column::new(Some(TableReference::bare("t")),
"Z")).alias("c"),
])?
.sort_with_limit(
vec![Expr::Column(Column::new_unqualified("c")).sort(false,
true)],
Some(1),
)?
.project(vec![
Expr::Column(Column::new_unqualified("a")),
Expr::Column(Column::new_unqualified("b")),
])?
.build()?;
let unparser = Unparser::default();
let sql = unparser.plan_to_sql(&plan)?.to_string();
// This assertion fails: ORDER BY references "c" but the alias was
dropped
assert!(
sql.contains("Z"),
"ORDER BY references 'c' but the alias was dropped: {sql}"
);
Ok(())
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]