Whatsonyourmind commented on issue #21310: URL: https://github.com/apache/datafusion/issues/21310#issuecomment-4185489300
@xiedeyantu One additional edge case for the rewrite rule: if either branch of the UNION contains a `LIMIT` clause, the transformation is invalid. `(SELECT a FROM t WHERE x LIMIT 5) UNION (SELECT a FROM t WHERE y LIMIT 5)` cannot be rewritten as `SELECT DISTINCT a FROM t WHERE x OR y LIMIT 10` because the LIMIT applies before the UNION dedup, not after — the two queries may produce overlapping rows that get deduped, so the merged result could have fewer than 10 rows. The optimizer rule should check for the absence of LIMIT, ORDER BY, and window functions in both branches before applying the transformation. Same applies to OFFSET — any row-limiting operation interacts with deduplication in order-dependent ways. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
