Re: [PR] Skip probe-side consumption when hash join build side is empty [datafusion]

2026-04-01 Thread via GitHub
kosiew commented on code in PR #21068: URL: https://github.com/apache/datafusion/pull/21068#discussion_r3026162087 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -855,6 +855,22 @@ pub(crate) fn need_produce_result_in_final(join_type: JoinType) -> bool { ) } +///

Re: [PR] feat: move shuffle writer disk I/O off tokio worker threads [datafusion-ballista]

2026-04-01 Thread via GitHub
martin-g commented on code in PR #1537: URL: https://github.com/apache/datafusion-ballista/pull/1537#discussion_r3026051897 ## ballista/core/src/execution_plans/shuffle_writer.rs: ## @@ -255,96 +252,114 @@ impl ShuffleWriterExec { } Some(Part

[PR] chore(deps): bump libc from 0.2.183 to 0.2.184 [datafusion-ballista]

2026-04-01 Thread via GitHub
dependabot[bot] opened a new pull request, #1538: URL: https://github.com/apache/datafusion-ballista/pull/1538 Bumps [libc](https://github.com/rust-lang/libc) from 0.2.183 to 0.2.184. Release notes Sourced from https://github.com/rust-lang/libc/releases";>libc's releases. 0.2

Re: [PR] Skip probe-side consumption when hash join build side is empty [datafusion]

2026-04-01 Thread via GitHub
kosiew commented on code in PR #21068: URL: https://github.com/apache/datafusion/pull/21068#discussion_r3026168085 ## datafusion/physical-plan/src/joins/utils.rs: ## Review Comment: The state-machine hoist should let us skip `process_probe_batch` entirely for the join type

Re: [PR] feat: feature-gate `sqllogictests` datafusion-substrait behind optional 'substrait' feature [datafusion]

2026-04-01 Thread via GitHub
zhuqi-lucas commented on PR #21268: URL: https://github.com/apache/datafusion/pull/21268#issuecomment-4174735233 The CI all green now, thanks @alamb for review. Merged to main now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] feat: feature-gate `sqllogictests` datafusion-substrait behind optional 'substrait' feature [datafusion]

2026-04-01 Thread via GitHub
zhuqi-lucas merged PR #21268: URL: https://github.com/apache/datafusion/pull/21268 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] Feature-gate datafusion-substrait behind optional feature to reduce compile time [datafusion]

2026-04-01 Thread via GitHub
zhuqi-lucas closed issue #21269: Feature-gate datafusion-substrait behind optional feature to reduce compile time URL: https://github.com/apache/datafusion/issues/21269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] DataFrame API: allow aggregate functions in select() (#17874) [datafusion]

2026-04-01 Thread via GitHub
cj-zhukov commented on PR #21021: URL: https://github.com/apache/datafusion/pull/21021#issuecomment-4174798208 I’ve pushed updates addressing the previous review comments: - Fixed unique alias generation - Added new tests to improve coverage - Added an example with aggregates in the

Re: [I] [DISCUSSION] Future of Dynamic Filters Sync [datafusion]

2026-04-01 Thread via GitHub
jayshrivastava commented on issue #21207: URL: https://github.com/apache/datafusion/issues/21207#issuecomment-4172330409 Sounds good - let's do Tuesday. Btw, would it make sense to discuss this topic during the community meetings on Wednesdays? -- This is an automated message fro

[PR] fix: use UTC for Arrow schema timezone in SparkToColumnar conversions [datafusion-comet]

2026-04-01 Thread via GitHub
andygrove opened a new pull request, #3878: URL: https://github.com/apache/datafusion-comet/pull/3878 ## Which issue does this PR close? Closes #2720 ## Rationale for this change `CometSparkToColumnarExec` and `CometLocalTableScanExec` used `conf.sessionLocalTimeZone` wh

Re: [PR] feat: add standalone shuffle benchmark tool and finer-grained shuffle metrics [datafusion-comet]

2026-04-01 Thread via GitHub
mbutrovich commented on code in PR #3752: URL: https://github.com/apache/datafusion-comet/pull/3752#discussion_r3024364468 ## native/shuffle/README.md: ## @@ -23,3 +23,46 @@ This crate provides the shuffle writer and reader implementation for Apache Data of the [Apache DataFus

Re: [PR] feat: add audit-comet-expression Claude Code skill [datafusion-comet]

2026-04-01 Thread via GitHub
martin-g commented on code in PR #3793: URL: https://github.com/apache/datafusion-comet/pull/3793#discussion_r3024341248 ## .claude/skills/audit-comet-expression/SKILL.md: ## @@ -0,0 +1,325 @@ +--- +name: audit-comet-expression +description: Audit an existing Comet expression fo

Re: [PR] [branch-52] Fix push_down_filter for children with non-empty fetch fields (#21057) [datafusion]

2026-04-01 Thread via GitHub
alamb commented on PR #21141: URL: https://github.com/apache/datafusion/pull/21141#issuecomment-4173214414 Thanks again @hareshkh and @AndreaBozzo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] feat: add audit-comet-expression Claude Code skill [datafusion-comet]

2026-04-01 Thread via GitHub
kazuyukitanimura commented on code in PR #3793: URL: https://github.com/apache/datafusion-comet/pull/3793#discussion_r3024925793 ## .claude/skills/audit-comet-expression/SKILL.md: ## @@ -0,0 +1,325 @@ +--- +name: audit-comet-expression +description: Audit an existing Comet expre

Re: [I] date_trunc incorrect results in non-UTC timezone [datafusion-comet]

2026-04-01 Thread via GitHub
andygrove commented on issue #2649: URL: https://github.com/apache/datafusion-comet/issues/2649#issuecomment-4173434302 ## Findings from PR #3877 I attempted a fix in #3877 that addressed both root causes identified in the issue. Here's a summary of the approach and remaining issues

[PR] doc: GetArrayItem is now supported [datafusion-comet]

2026-04-01 Thread via GitHub
kazuyukitanimura opened a new pull request, #3880: URL: https://github.com/apache/datafusion-comet/pull/3880 ## Which issue does this PR close? ## Rationale for this change #3709 fixed the `GetArrayItem` issue. ## What changes are included in this PR? ## How are th

Re: [I] Release DataFusion-Python 53.0.0 [datafusion-python]

2026-04-01 Thread via GitHub
timsaucer commented on issue #1442: URL: https://github.com/apache/datafusion-python/issues/1442#issuecomment-4172242882 Ok, question for both of you: I ran [this skill](https://github.com/apache/datafusion-python/pull/1460) and found a bunch of valid areas where we have APIs upstream that

Re: [PR] doc: Add documentation explaining the behavior of `null` values ​​in struct comparisons [datafusion]

2026-04-01 Thread via GitHub
comphead merged PR #21226: URL: https://github.com/apache/datafusion/pull/21226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] All struct-aware optimizations are hardcoded to `GetFieldFunc` [datafusion]

2026-04-01 Thread via GitHub
adriangb commented on issue #21306: URL: https://github.com/apache/datafusion/issues/21306#issuecomment-4172620378 > There's no trait or capability API for a UDF to declare "I'm a struct field access." There is: https://github.com/apache/datafusion/blob/1416ed4d5007180136dae0aaeb921f

[PR] fix: skip Comet columnar shuffle for stages with DPP scans [datafusion-comet]

2026-04-01 Thread via GitHub
andygrove opened a new pull request, #3879: URL: https://github.com/apache/datafusion-comet/pull/3879 **[EXPERIMENTAL]** ## Which issue does this PR close? Closes #3874. ## Rationale for this change When a scan uses Dynamic Partition Pruning (DPP) and falls back to

Re: [PR] fix: date_trunc correct results in non-UTC timezones [datafusion-comet]

2026-04-01 Thread via GitHub
andygrove closed pull request #3877: fix: date_trunc correct results in non-UTC timezones URL: https://github.com/apache/datafusion-comet/pull/3877 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] chore: Remove redundant `parquet.enable.dictionary` ConfigMatrix from SQL tests [datafusion-comet]

2026-04-01 Thread via GitHub
kazuyukitanimura commented on PR #3866: URL: https://github.com/apache/datafusion-comet/pull/3866#issuecomment-4173011999 Not against for this change, but do we plan to add dictionary encoded parquet tests in the future? -- This is an automated message from the Apache Git Service. To res

Re: [I] perf: Consider using async I/O (tokio::fs) in shuffle writer [datafusion-ballista]

2026-04-01 Thread via GitHub
milenkovicm commented on issue #1387: URL: https://github.com/apache/datafusion-ballista/issues/1387#issuecomment-4172051930 it might be a bit hard to measure benefit of this change as in mosts cases writes might not block -- This is an automated message from the Apache Git Service. To r

Re: [PR] Add configurable UNION DISTINCT to FILTER rewrite optimization [datafusion]

2026-04-01 Thread via GitHub
comphead commented on PR #21075: URL: https://github.com/apache/datafusion/pull/21075#issuecomment-4173717958 I would agree with @alamb to create initial ticket stating the problem. PR description is nice but it is a solution whereas ticket is a problem statement and some people could also

Re: [PR] Feat: to_json Infinity/-Infinity Nan values support [datafusion-comet]

2026-04-01 Thread via GitHub
kazuyukitanimura commented on PR #3875: URL: https://github.com/apache/datafusion-comet/pull/3875#issuecomment-4172999832 I think we need add new tests for +/- inf and NaN -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] feat: Add immediate mode option for native shuffle [datafusion-comet]

2026-04-01 Thread via GitHub
milenkovicm commented on PR #3845: URL: https://github.com/apache/datafusion-comet/pull/3845#issuecomment-4172024949 thanks @andygrove will have a look, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] feat: feature-gate `sqllogictests` datafusion-substrait behind optional 'substrait' feature [datafusion]

2026-04-01 Thread via GitHub
alamb commented on PR #21268: URL: https://github.com/apache/datafusion/pull/21268#issuecomment-4172713030 The benchmark results result seems to pass on main; https://github.com/apache/datafusion/actions/runs/23868036699/job/69592615386 I'll update this branch to try and get a clean C

Re: [PR] chore: `native_datafusion` to report scan task input metrics [datafusion-comet]

2026-04-01 Thread via GitHub
comphead commented on PR #3842: URL: https://github.com/apache/datafusion-comet/pull/3842#issuecomment-4173731765 @mbutrovich @martin-g PTAL the `output_rows` for filtered queries shows same as Spark values. -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] [branch-52] Fix push_down_filter for children with non-empty fetch fields (#21057) [datafusion]

2026-04-01 Thread via GitHub
alamb merged PR #21141: URL: https://github.com/apache/datafusion/pull/21141 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Feature/jupyter notebook support [datafusion-ballista]

2026-04-01 Thread via GitHub
milenkovicm commented on PR #1513: URL: https://github.com/apache/datafusion-ballista/pull/1513#issuecomment-4172160728 thanks @sandugood will have a look asap -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] feat: enable native_datafusion scan in auto mode [datafusion-comet]

2026-04-01 Thread via GitHub
andygrove commented on code in PR #3781: URL: https://github.com/apache/datafusion-comet/pull/3781#discussion_r3024142259 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -168,8 +168,13 @@ case class CometScanRule(session: SparkSession) COMET_

Re: [PR] feat: enable native_datafusion scan in auto mode [datafusion-comet]

2026-04-01 Thread via GitHub
andygrove commented on code in PR #3781: URL: https://github.com/apache/datafusion-comet/pull/3781#discussion_r3024162164 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -168,8 +168,13 @@ case class CometScanRule(session: SparkSession) COMET_

[I] All struct-aware optimizations are hardcoded to `GetFieldFunc` [datafusion]

2026-04-01 Thread via GitHub
friendlymatthew opened a new issue, #21306: URL: https://github.com/apache/datafusion/issues/21306 ### Is your feature request related to a problem or challenge? Every optimization for struct field querying (row level pushdown, leaf-level projection pruning, `PushdownChecker`, `resolv

Re: [PR] Allow Spark partial / Comet final for compatible aggregates [datafusion-comet]

2026-04-01 Thread via GitHub
Shekharrajak commented on PR #2994: URL: https://github.com/apache/datafusion-comet/pull/2994#issuecomment-4172386159 CI check failure - debugging : df1: count(DISTINCT 2), count(DISTINCT 2, 3) | [1, 1] -- PASS df2: count(DISTINCT 2), count(DISTINCT 3, 2) | [2, 2] -- FAIL

Re: [PR] feat: add standalone shuffle benchmark tool and finer-grained shuffle metrics [datafusion-comet]

2026-04-01 Thread via GitHub
andygrove commented on code in PR #3752: URL: https://github.com/apache/datafusion-comet/pull/3752#discussion_r3024300491 ## native/shuffle/README.md: ## @@ -23,3 +23,46 @@ This crate provides the shuffle writer and reader implementation for Apache Data of the [Apache DataFusi

Re: [PR] feat: add standalone shuffle benchmark tool and finer-grained shuffle metrics [datafusion-comet]

2026-04-01 Thread via GitHub
andygrove commented on code in PR #3752: URL: https://github.com/apache/datafusion-comet/pull/3752#discussion_r3024309975 ## native/shuffle/src/bin/shuffle_bench.rs: ## @@ -0,0 +1,768 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] Add configurable UNION DISTINCT to FILTER rewrite optimization [datafusion]

2026-04-01 Thread via GitHub
xiedeyantu commented on PR #21075: URL: https://github.com/apache/datafusion/pull/21075#issuecomment-4173388045 @comphead I'm not sure if you can help me review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Mark joins don't support null mark columns [datafusion]

2026-04-01 Thread via GitHub
AdamGS commented on issue #21309: URL: https://github.com/apache/datafusion/issues/21309#issuecomment-4173542210 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[I] Mark joins don't support null mark columns [datafusion]

2026-04-01 Thread via GitHub
AdamGS opened a new issue, #21309: URL: https://github.com/apache/datafusion/issues/21309 From the [Story of Joins](https://www.cs.cmu.edu/~15721-f24/papers/Story_of_Joins.pdf) paper, for queries like: ```sql select Title, ECTS = any (select ECTS from Courses c2 where Lecturer = 12

Re: [I] Release DataFusion-Python 53.0.0 [datafusion-python]

2026-04-01 Thread via GitHub
nuno-faria commented on issue #1442: URL: https://github.com/apache/datafusion-python/issues/1442#issuecomment-4172867572 > Question is: should we wait to get all these in and then start the DF53 release or start the release now and get those all in for 54? I think we can try to merg

Re: [PR] feat: add standalone shuffle benchmark tool and finer-grained shuffle metrics [datafusion-comet]

2026-04-01 Thread via GitHub
andygrove commented on code in PR #3752: URL: https://github.com/apache/datafusion-comet/pull/3752#discussion_r3024320845 ## native/shuffle/src/bin/shuffle_bench.rs: ## @@ -0,0 +1,768 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [I] `array_overlap` correctness issue [datafusion-comet]

2026-04-01 Thread via GitHub
kazuyukitanimura commented on issue #3645: URL: https://github.com/apache/datafusion-comet/issues/3645#issuecomment-4173186658 Related https://github.com/apache/datafusion-comet/issues/3175 Related https://github.com/apache/datafusion-comet/pull/3364 -- This is an automated message fro

Re: [PR] feat: add standalone shuffle benchmark tool and finer-grained shuffle metrics [datafusion-comet]

2026-04-01 Thread via GitHub
andygrove commented on code in PR #3752: URL: https://github.com/apache/datafusion-comet/pull/3752#discussion_r3024308245 ## native/shuffle/src/bin/shuffle_bench.rs: ## @@ -0,0 +1,768 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] feat: Complete basic `LATERAL JOIN` functionality [datafusion]

2026-04-01 Thread via GitHub
alamb commented on PR #21202: URL: https://github.com/apache/datafusion/pull/21202#issuecomment-4173212689 🤯 @neilconway is on the way to getting our subqueries in shape. Can't wait to see where we end up -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] feat: add standalone shuffle benchmark tool and finer-grained shuffle metrics [datafusion-comet]

2026-04-01 Thread via GitHub
andygrove commented on code in PR #3752: URL: https://github.com/apache/datafusion-comet/pull/3752#discussion_r3024304942 ## native/shuffle/src/metrics.rs: ## @@ -33,6 +33,15 @@ pub(crate) struct ShufflePartitionerMetrics { /// Time spent writing to disk. Maps to "shuffleWr

Re: [PR] feat: Add immediate mode option for native shuffle [datafusion-comet]

2026-04-01 Thread via GitHub
andygrove commented on PR #3845: URL: https://github.com/apache/datafusion-comet/pull/3845#issuecomment-4172527414 > * written blocks are not ordered by partition, am i correct (perhaps documentation about format of data file and index file could be added) The final file contains dat

Re: [I] Inefficient DPP fallback for TPC-DS [datafusion-comet]

2026-04-01 Thread via GitHub
andygrove commented on issue #3874: URL: https://github.com/apache/datafusion-comet/issues/3874#issuecomment-4173073364 > The solution is probably to wrap the Spark scan in `CometSparkToColumnarExec` so the rest of the stage is Comet native. This may add too much overhead. Perhaps we

Re: [I] Blog post about 1000 distinct committers / history of the project [datafusion]

2026-04-01 Thread via GitHub
theirix commented on issue #21305: URL: https://github.com/apache/datafusion/issues/21305#issuecomment-4172224921 > > I am wondering how this GitHub contributor list is composed for an Apache-structured project. I don't see my commits there. Does it only include committers/maintainers who a

[I] `ProjectionExec` produces unknown statistics for all `ScalarFunctionExpr` outputs [datafusion]

2026-04-01 Thread via GitHub
friendlymatthew opened a new issue, #21307: URL: https://github.com/apache/datafusion/issues/21307 ### Is your feature request related to a problem or challenge? `ProjectionExec::project_statistics()` only propagates column statistics for plain `Column` references and `Literal` values

[I] [EPIC] first class support for struct field / Variant access in Parquet [datafusion]

2026-04-01 Thread via GitHub
friendlymatthew opened a new issue, #21308: URL: https://github.com/apache/datafusion/issues/21308 This issue tracks the hard problems that block efficient querying of struct field columns in Parquet To investigate, I used an MRE based on a variant column represented as a nested stru

Re: [PR] Teach row group pruning about struct field predicates [datafusion]

2026-04-01 Thread via GitHub
friendlymatthew commented on PR #21003: URL: https://github.com/apache/datafusion/pull/21003#issuecomment-4172569074 This is probably blocked until we figure out: https://github.com/apache/datafusion/issues/21306 -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] `ProjectionExec` produces unknown statistics for all `ScalarFunctionExpr` outputs [datafusion]

2026-04-01 Thread via GitHub
friendlymatthew commented on issue #21307: URL: https://github.com/apache/datafusion/issues/21307#issuecomment-4172576731 @alamb, any chance you could tag people who are knowledgable about `ScalarUdfImpl`? -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] chore(deps): bump jni from 0.21.1 to 0.22.4 in /native [datafusion-comet]

2026-04-01 Thread via GitHub
parthchandra commented on code in PR #3753: URL: https://github.com/apache/datafusion-comet/pull/3753#discussion_r3024543651 ## native/core/src/execution/jni_api.rs: ## @@ -778,33 +778,31 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_writeSortedFileNative c

Re: [PR] fix: avoid internal errors for OneOf signature mismatches [datafusion]

2026-04-01 Thread via GitHub
myandpr commented on PR #21032: URL: https://github.com/apache/datafusion/pull/21032#issuecomment-4172741676 @alamb @Jefffrey sorry for the delayed reply. I spent some time digging into this further and also looked through the discussion in #20070. I updated the implementation to keep

[I] Add configurable UNION DISTINCT support to FILTER rewrite optimization [datafusion]

2026-04-01 Thread via GitHub
xiedeyantu opened a new issue, #21310: URL: https://github.com/apache/datafusion/issues/21310 ### Is your feature request related to a problem or challenge? The optimizer currently does not have a configurable rewrite for eligible `UNION DISTINCT` queries that could be collapsed into

Re: [I] Add configurable UNION DISTINCT support to FILTER rewrite optimization [datafusion]

2026-04-01 Thread via GitHub
xiedeyantu commented on issue #21310: URL: https://github.com/apache/datafusion/issues/21310#issuecomment-4173860974 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: Refactor DistributedQueryExec to support job callback mechanisim [datafusion-ballista]

2026-04-01 Thread via GitHub
danielhumanmod commented on PR #1536: URL: https://github.com/apache/datafusion-ballista/pull/1536#issuecomment-4173864599 > thanks @danielhumanmod just bear with me, i'm a but stuck with time, will have a look asap No rush, please take your take! I can keep working on the 2nd part a

[PR] fix: date_trunc correct results in non-UTC timezones [datafusion-comet]

2026-04-01 Thread via GitHub
andygrove opened a new pull request, #3877: URL: https://github.com/apache/datafusion-comet/pull/3877 ## Which issue does this PR close? Closes #2649. ## Rationale for this change `date_trunc` (TruncTimestamp) was producing wrong results for non-UTC timezones due to two

Re: [PR] Estimate aggregate output rows using existing NDV statistics [datafusion]

2026-04-01 Thread via GitHub
2010YOUY01 commented on PR #20926: URL: https://github.com/apache/datafusion/pull/20926#issuecomment-4173976995 > @2010YOUY01, apologies for the direct ping, would you be interesting in taking a look? > > Re. the discussion in [#21120 (comment)](https://github.com/apache/datafusion/i

[PR] feat: move shuffle writer disk I/O off tokio worker threads [datafusion-ballista]

2026-04-01 Thread via GitHub
hcrosse opened a new pull request, #1537: URL: https://github.com/apache/datafusion-ballista/pull/1537 ## Which issue does this PR close? Closes #1387. ## Rationale The shuffle writer performs synchronous `std::fs` I/O inside async task contexts. Under concurrent shuffle

Re: [PR] Add configurable UNION DISTINCT to FILTER rewrite optimization [datafusion]

2026-04-01 Thread via GitHub
xiedeyantu commented on PR #21075: URL: https://github.com/apache/datafusion/pull/21075#issuecomment-4173875676 > I would agree with @alamb to create initial ticket stating the problem. PR description is nice but it is a solution whereas ticket is a problem statement and some people could a

[I] Make BatchPartitioner::partition_iter public for downstream async consumers [datafusion]

2026-04-01 Thread via GitHub
hcrosse opened a new issue, #21311: URL: https://github.com/apache/datafusion/issues/21311 ### Is your feature request related to a problem or challenge? `BatchPartitioner::partition` takes a sync `FnMut` closure, which means consumers that need to do I/O with the partitioned batches

Re: [PR] feat: [iceberg] allow native Iceberg scans with non-identity transform residuals [datafusion-comet]

2026-04-01 Thread via GitHub
github-actions[bot] commented on PR #2948: URL: https://github.com/apache/datafusion-comet/pull/2948#issuecomment-4174172768 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or commen

Re: [PR] dorny: rs change [datafusion-sandbox]

2026-04-01 Thread via GitHub
github-actions[bot] closed pull request #143: dorny: rs change URL: https://github.com/apache/datafusion-sandbox/pull/143 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] dorny: md change [datafusion-sandbox]

2026-04-01 Thread via GitHub
github-actions[bot] closed pull request #142: dorny: md change URL: https://github.com/apache/datafusion-sandbox/pull/142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] chore(deps): bump sphinx-reredirects from 1.0.0 to 1.1.0 in /docs [datafusion-sandbox]

2026-04-01 Thread via GitHub
github-actions[bot] closed pull request #102: chore(deps): bump sphinx-reredirects from 1.0.0 to 1.1.0 in /docs URL: https://github.com/apache/datafusion-sandbox/pull/102 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] chore(deps): bump sphinx-reredirects from 1.0.0 to 1.1.0 in /docs [datafusion-sandbox]

2026-04-01 Thread via GitHub
dependabot[bot] commented on PR #102: URL: https://github.com/apache/datafusion-sandbox/pull/102#issuecomment-4174170682 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor ve

Re: [PR] Support optional AS keyword in CTE definitions for Databricks [datafusion-sqlparser-rs]

2026-04-01 Thread via GitHub
funcpp commented on PR #2286: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2286#issuecomment-4174182418 @iffyio Hi! Could you take a look at these Databricks dialect PRs when you have a chance? - #2286 (CTE without AS) - #2287 (! operator) - #2288 (GROUPING SET

Re: [I] Add configurable UNION DISTINCT support to FILTER rewrite optimization [datafusion]

2026-04-01 Thread via GitHub
comphead commented on issue #21310: URL: https://github.com/apache/datafusion/issues/21310#issuecomment-4174228448 Great, thanks @xiedeyantu for the ticket, Datafusion currently supports bunch of UNION optimizations, flatten, etc, but there is no rewrite UNION into set of ORs, which makes s

[PR] chore: add `.claude/settings.local.json` to `.gitignore` [datafusion]

2026-04-01 Thread via GitHub
jonahgao opened a new pull request, #21312: URL: https://github.com/apache/datafusion/pull/21312 ## Which issue does this PR close? - N/A ## Rationale for this change `.claude/settings.local.json` is a personal project-specific settings file for Claude Code and sho

Re: [PR] feat(spark): Adds spark round function [datafusion]

2026-04-01 Thread via GitHub
SubhamSinghal commented on PR #21062: URL: https://github.com/apache/datafusion/pull/21062#issuecomment-4174415515 @martin-g @alamb fixed build failure. it can be merged now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Add configurable UNION DISTINCT support to FILTER rewrite optimization [datafusion]

2026-04-01 Thread via GitHub
xiedeyantu commented on issue #21310: URL: https://github.com/apache/datafusion/issues/21310#issuecomment-4174297032 @comphead Thanks for reviewing this issue. I believe this change can help reduce scan I/O. Although it may not significantly reduce the end-to-end query time (since the sub-c

[PR] Make NDV merge order-invariant with multi-input overlap estimation [datafusion]

2026-04-01 Thread via GitHub
kosiew opened a new pull request, #21313: URL: https://github.com/apache/datafusion/pull/21313 ## Which issue does this PR close? * Closes #20966. --- ## Rationale for this change The existing NDV merge implementation (`estimate_ndv_with_overlap`) is not associati

Re: [PR] feat: Refactor DistributedQueryExec to support job callback mechanisim [datafusion-ballista]

2026-04-01 Thread via GitHub
danielhumanmod commented on PR #1536: URL: https://github.com/apache/datafusion-ballista/pull/1536#issuecomment-4174564856 Just a quick thought: Instead of adding handlers, what if we created new Execs and combined them on the client side? For example, for EXPLAIN ANALYZE, we could build a

Re: [PR] feat: Create dynamic filters in SortMergeJoin [datafusion]

2026-04-01 Thread via GitHub
adriangb commented on code in PR #21267: URL: https://github.com/apache/datafusion/pull/21267#discussion_r3025836766 ## datafusion/physical-plan/src/limit.rs: ## @@ -436,6 +467,29 @@ impl ExecutionPlan for LocalLimitExec { fn cardinality_effect(&self) -> CardinalityEffect {

Re: [PR] functions-aggregate: Support dictionary scalar coercion for min/max [datafusion]

2026-04-01 Thread via GitHub
kosiew commented on code in PR #21151: URL: https://github.com/apache/datafusion/pull/21151#discussion_r3025716158 ## datafusion/functions-aggregate-common/src/min_max.rs: ## @@ -413,6 +413,31 @@ macro_rules! min_max { min_max_generic!(lhs, rhs, $OP)

Re: [PR] functions-aggregate: Support dictionary scalar coercion for min/max [datafusion]

2026-04-01 Thread via GitHub
kosiew commented on code in PR #21151: URL: https://github.com/apache/datafusion/pull/21151#discussion_r3025725747 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -1270,4 +1270,104 @@ mod tests { assert_eq!(max_result, ScalarValue::Utf8(Some("🦀".to_string(;

Re: [PR] functions-aggregate: Support dictionary scalar coercion for min/max [datafusion]

2026-04-01 Thread via GitHub
kosiew commented on code in PR #21151: URL: https://github.com/apache/datafusion/pull/21151#discussion_r3025716160 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -1270,4 +1270,104 @@ mod tests { assert_eq!(max_result, ScalarValue::Utf8(Some("🦀".to_string(;

Re: [PR] fix: preserve subquery structure when unparsing SubqueryAlias over Ag… [datafusion]

2026-04-01 Thread via GitHub
kosiew commented on code in PR #21099: URL: https://github.com/apache/datafusion/pull/21099#discussion_r3025803279 ## datafusion/sql/src/unparser/plan.rs: ## @@ -828,6 +828,27 @@ impl Unparser<'_> { Some(plan_alias.alias.clone()), select

Re: [PR] feat: support dynamic filter pushdown through SortMergeJoinExec [datafusion]

2026-04-01 Thread via GitHub
mdashti closed pull request #20455: feat: support dynamic filter pushdown through SortMergeJoinExec URL: https://github.com/apache/datafusion/pull/20455 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] fix(datasource): keep stats absent when collect_stats is false [datafusion]

2026-04-01 Thread via GitHub
kosiew commented on code in PR #21149: URL: https://github.com/apache/datafusion/pull/21149#discussion_r3025756006 ## datafusion/datasource/src/statistics.rs: ## @@ -320,16 +320,19 @@ pub async fn get_statistics_with_limit( file.statistics = Some(Arc::clone(&file_stats)

Re: [PR] Add ExpressionAnalyzer for pluggable expression-level statistics estimation [datafusion]

2026-04-01 Thread via GitHub
kosiew commented on code in PR #21122: URL: https://github.com/apache/datafusion/pull/21122#discussion_r3025779809 ## datafusion/physical-expr/src/projection.rs: ## @@ -713,9 +745,35 @@ impl ProjectionExprs { byte_size, }

[I] feat: Support `RESET ALL` Command [datafusion]

2026-04-01 Thread via GitHub
erenavsarogullari opened a new issue, #21314: URL: https://github.com/apache/datafusion/issues/21314 ### Is your feature request related to a problem or challenge? Currently, DataFusion supports `RESET` Command when resetting the configuration to its default value. PostgreSQL and

Re: [I] feat: Support `RESET ALL` Command [datafusion]

2026-04-01 Thread via GitHub
erenavsarogullari commented on issue #21314: URL: https://github.com/apache/datafusion/issues/21314#issuecomment-4174528015 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Enable `!` as NOT operator for Databricks dialect [datafusion-sqlparser-rs]

2026-04-01 Thread via GitHub
iffyio merged PR #2287: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2287 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] feat: multiple columns in count distinct [datafusion]

2026-04-01 Thread via GitHub
Mark1626 commented on PR #20460: URL: https://github.com/apache/datafusion/pull/20460#issuecomment-4174552060 Bumping this up, any review comments on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat: Additional Canonical Extension Types [datafusion]

2026-04-01 Thread via GitHub
paleolimbot commented on code in PR #21291: URL: https://github.com/apache/datafusion/pull/21291#discussion_r3025481991 ## datafusion/common/src/types/canonical_extensions/bool8.rs: ## @@ -0,0 +1,133 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Skip probe-side consumption when hash join build side is empty [datafusion]

2026-04-01 Thread via GitHub
adriangb commented on code in PR #21068: URL: https://github.com/apache/datafusion/pull/21068#discussion_r3025907085 ## datafusion/physical-plan/src/joins/utils.rs: ## Review Comment: This method is currently called from `HashJoinStream::process_probe_batch`. Should we rem

Re: [PR] ensure dynamic filters are correctly pushed down through aggregations [datafusion]

2026-04-01 Thread via GitHub
adriangb commented on PR #21059: URL: https://github.com/apache/datafusion/pull/21059#issuecomment-4174666570 Triggering CI. Assuming it passes LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: Add Semi/Anti join to PiecewiseMergeJoin [datafusion]

2026-04-01 Thread via GitHub
adriangb commented on PR #18392: URL: https://github.com/apache/datafusion/pull/18392#issuecomment-4174671266 Hi folks, just pinging here to see if we can avoid this falling through the cracks? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Adds INList and Between expr to skip outer join [datafusion]

2026-04-01 Thread via GitHub
adriangb commented on code in PR #21303: URL: https://github.com/apache/datafusion/pull/21303#discussion_r3025917556 ## datafusion/optimizer/src/eliminate_outer_join.rs: ## @@ -436,6 +454,221 @@ mod tests { ") } +#[test] +fn eliminate_left_with_in_list()

Re: [I] [Feature Request] Opt-in lenient schema mode to restore DF 51 SchemaAdapter behavior for Parquet reading [datafusion]

2026-04-01 Thread via GitHub
zhuqi-lucas commented on issue #21290: URL: https://github.com/apache/datafusion/issues/21290#issuecomment-4174676114 Thanks, i found the workaround in our custom system now, and working well. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] feat: generate reversed-name data for sort pushdown benchmark [datafusion]

2026-04-01 Thread via GitHub
zhuqi-lucas commented on PR #21266: URL: https://github.com/apache/datafusion/pull/21266#issuecomment-4174691492 @adriangb Good suggestion! Reusing TPC-H data with multiple parts and renaming is much simpler. Plan: 1. Generate TPC-H with `--parts=3` (files have non-overlapping l_order

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-04-01 Thread via GitHub
Dandandan commented on code in PR #21240: URL: https://github.com/apache/datafusion/pull/21240#discussion_r3020265729 ## datafusion/sqllogictest/test_files/tpch/plans/q22.slt.part: ## @@ -61,40 +61,41 @@ logical_plan 03)Aggregate: groupBy=[[custsale.cntrycode]], aggr=[[coun

[I] [Feature Request] Opt-in lenient schema mode to restore DF 51 SchemaAdapter behavior for Parquet reading [datafusion]

2026-04-01 Thread via GitHub
zhuqi-lucas opened a new issue, #21290: URL: https://github.com/apache/datafusion/issues/21290 ## Problem After upgrading from DataFusion 51 to 52, we hit several hard errors reading Parquet files that worked fine in DF 51. The root cause is that **DF 52 removed `SchemaAdapter`** ([P

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-04-01 Thread via GitHub
Dandandan commented on code in PR #21240: URL: https://github.com/apache/datafusion/pull/21240#discussion_r3020277987 ## datafusion/physical-expr/src/scalar_subquery.rs: ## @@ -0,0 +1,222 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [I] [Feature Request] Opt-in lenient schema mode to restore DF 51 SchemaAdapter behavior for Parquet reading [datafusion]

2026-04-01 Thread via GitHub
zhuqi-lucas commented on issue #21290: URL: https://github.com/apache/datafusion/issues/21290#issuecomment-4168076107 cc @adriangb @alamb @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] feat: Create dynamic filters in SortMergeJoin [datafusion]

2026-04-01 Thread via GitHub
Dandandan commented on PR #21267: URL: https://github.com/apache/datafusion/pull/21267#issuecomment-4168260905 run benchmark tpch tpcds ``` env: PREFER_HASH_JOIN: false ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] feat: Create dynamic filters in SortMergeJoin [datafusion]

2026-04-01 Thread via GitHub
adriangbot commented on PR #21267: URL: https://github.com/apache/datafusion/pull/21267#issuecomment-4168274564 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21267#issuecomment-4168260905) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] feat: Create dynamic filters in SortMergeJoin [datafusion]

2026-04-01 Thread via GitHub
adriangbot commented on PR #21267: URL: https://github.com/apache/datafusion/pull/21267#issuecomment-4168275332 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21267#issuecomment-4168260905) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

  1   2   3   >