kosiew commented on code in PR #21099:
URL: https://github.com/apache/datafusion/pull/21099#discussion_r3025803279
##########
datafusion/sql/src/unparser/plan.rs:
##########
@@ -828,6 +828,27 @@ impl Unparser<'_> {
Some(plan_alias.alias.clone()),
select.already_projected(),
)?;
+
+ // If the SubqueryAlias directly wraps a plan that builds its
+ // own SELECT clauses (e.g. Aggregate adds GROUP BY, Window
adds
+ // OVER, etc.) and unparse_table_scan_pushdown couldn't reduce
it,
+ // we must emit a derived subquery: (SELECT ...) AS alias.
+ // Without this, the recursive handler would merge those
clauses
+ // into the outer SELECT, losing the subquery structure
entirely.
+ if unparsed_table_scan.is_none()
+ &&
Self::requires_derived_subquery(plan_alias.input.as_ref())
Review Comment:
I think there is still a subtle issue here.
`subquery_alias_inner_query_and_columns()` can already unwrap
`plan_alias.input` into a different inner `plan`, for example turning a
`Projection -> Aggregate` into just the aggregate when column aliases are
involved.
This guard is still checking `plan_alias.input`, though. Because of that,
cases like projected aggregates, windows, sorts, limits, or unions can skip the
derived subquery path and end up being flattened into the outer SELECT.
That brings back the original correctness issue for parser generated queries
like `FROM (SELECT max(j1_id) FROM j1) AS c(id)`.
Can we key this check, and the `derive(...)` call, off the rewritten `plan`
instead? That should make the behavior consistent with the alias rewrite logic.
##########
datafusion/sql/tests/cases/plan_to_sql.rs:
##########
@@ -2893,3 +2893,58 @@ fn test_json_access_3() {
@r#"SELECT (j1.j1_string : 'field.inner1[''inner2'']') FROM j1"#
);
}
+
+/// Test that unparsing a manually constructed join with a subquery aggregate
+/// preserves the MAX aggregate function.
+///
+/// Builds the equivalent of:
+/// SELECT j1.j1_string FROM j1
+/// JOIN (SELECT max(j2_id) AS max_id FROM j2) AS b
+/// ON j1.j1_id = b.max_id
+#[test]
+fn test_unparse_manual_join_with_subquery_aggregate() -> Result<()> {
Review Comment:
This test is great and definitely helps 👍
One small suggestion. Right now it only covers a `SubqueryAlias` wrapping an
`Aggregate` directly. Since the alias rewrite helper can also turn `Projection
-> Aggregate` shapes into an inner aggregate, it might be worth adding a
roundtrip test for a more realistic SQL case with column aliases.
Something like `SELECT id FROM (SELECT max(j1_id) FROM j1) AS c(id)` would
help lock down that edge case as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]