This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new 4ae19ebce1 fix: update clickbench expected plan for NDV-aware
optimization (#21050)
4ae19ebce1 is described below
commit 4ae19ebce11b02fc73d37d25dacc07d36c7221ef
Author: Alessandro Solimando <[email protected]>
AuthorDate: Thu Mar 19 13:41:52 2026 +0100
fix: update clickbench expected plan for NDV-aware optimization (#21050)
## Which issue does this PR close?
Fixes CI breakage on `main` introduced by #19957.
## Rationale for this change
#19957 introduced NDV extraction from Parquet metadata. The optimizer
now sees NDV=1 for `HitColor`, `BrowserCountry`, `BrowserLanguage` in
the clickbench test file and short-circuits `COUNT(DISTINCT)` to a
constant projection, skipping the full table scan.
## What changes are included in this PR?
Updates the expected EXPLAIN plan in `clickbench.slt` to match the new
(better) physical plan:
```diff
- 01)AggregateExec: mode=Single, gby=[], aggr=[count(DISTINCT
hits.HitColor), ...]
- 02)--DataSourceExec: file_groups={1 group: [...]},
projection=[HitColor, BrowserLanguage, BrowserCountry], file_type=parquet
+ 01)ProjectionExec: expr=[1 as count(DISTINCT hits.HitColor), 1 as
count(DISTINCT hits.BrowserCountry), 1 as count(DISTINCT hits.BrowserLanguage)]
+ 02)--PlaceholderRowExec
```
## Are these changes tested?
This PR *is* the test fix. Verified locally with `cargo test --profile
ci -p datafusion-sqllogictest --test sqllogictests`.
## Are there any user-facing changes?
No.
---
datafusion/sqllogictest/test_files/clickbench.slt | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/datafusion/sqllogictest/test_files/clickbench.slt
b/datafusion/sqllogictest/test_files/clickbench.slt
index 42f066a80d..4e9849e365 100644
--- a/datafusion/sqllogictest/test_files/clickbench.slt
+++ b/datafusion/sqllogictest/test_files/clickbench.slt
@@ -1203,8 +1203,8 @@ logical_plan
02)--SubqueryAlias: hits
03)----TableScan: hits_raw projection=[HitColor, BrowserLanguage,
BrowserCountry]
physical_plan
-01)AggregateExec: mode=Single, gby=[], aggr=[count(DISTINCT hits.HitColor),
count(DISTINCT hits.BrowserCountry), count(DISTINCT hits.BrowserLanguage)]
-02)--DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/core/tests/data/clickbench_hits_10.parquet]]},
projection=[HitColor, BrowserLanguage, BrowserCountry], file_type=parquet
+01)ProjectionExec: expr=[1 as count(DISTINCT hits.HitColor), 1 as
count(DISTINCT hits.BrowserCountry), 1 as count(DISTINCT hits.BrowserLanguage)]
+02)--PlaceholderRowExec
query III
SELECT COUNT(DISTINCT "HitColor"), COUNT(DISTINCT "BrowserCountry"),
COUNT(DISTINCT "BrowserLanguage") FROM hits;
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]