barry3406 commented on issue #21507: URL: https://github.com/apache/datafusion/issues/21507#issuecomment-4222788641
Thanks for the triage @asolimando — the Postgres reference makes the expected semantics clear. The bug looks like `functional_dependencies.rs` treats a `UNIQUE` constraint the same way as a `PRIMARY KEY` for FD inference, but per SQL spec a UNIQUE column can have multiple NULLs, so `(a)` is not actually a unique row identifier when NULLs are present. That's why the GROUP BY collapses the two NULL rows into one. I can put up a fix that distinguishes UNIQUE from PRIMARY KEY in FD derivation (only emit a functional dependency from UNIQUE columns when the column is also `NOT NULL`), with a sqllogictest regression covering the NULL case. Sound right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
