andygrove opened a new issue, #4122: URL: https://github.com/apache/datafusion-comet/issues/4122
## Describe the bug On Spark 4.1.1 with Comet enabled, two `SQLQueryTestSuite` queries return incorrect results. The same `.sql` and golden `.out` files pass on Spark 4.0.2. ### `except-all.sql` query #22 ```sql SELECT v FROM tab3 GROUP BY v EXCEPT ALL SELECT k FROM tab4 GROUP BY k ``` Expected output: `3`. Actual output: `2\n3` (one extra row). ### `intersect-all.sql` query #15 ```sql SELECT v FROM tab1 GROUP BY v INTERSECT ALL SELECT k FROM tab2 GROUP BY k ``` Expected output: `2\n3\nNULL`. Actual output: empty result. ## Steps to reproduce Run Spark 4.1.1's SQL test suite with Comet enabled (the `Spark SQL Tests` matrix entry for 4.1.1). Both files fail in `SQLQueryTestSuite`. ## Expected behavior Comet should produce the same EXCEPT ALL / INTERSECT ALL results as Spark. ## Workaround Both files are currently disabled when Comet is enabled via `--SET spark.comet.enabled = false` at the top of each file in `dev/diffs/4.1.1.diff`. ## Additional context The input `.sql` files and golden `.out` files are byte-identical between Spark 4.0.2 and 4.1.1, so the regression is in either Spark planner/optimizer behavior or in Comet's interaction with it on 4.1. PR #4093 enables Spark 4.1.1 in the `Spark SQL Tests` workflow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
