Re: [PR] feat: Enable native c2r by default, add debug asserts [datafusion-comet]

2026-03-09 Thread via GitHub
andygrove commented on code in PR #3649: URL: https://github.com/apache/datafusion-comet/pull/3649#discussion_r2906589224 ## native/core/src/execution/jni_api.rs: ## @@ -952,6 +952,7 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_columnarToRowConvert( ) -> jni::

Re: [PR] perf: specialized SemiAntiSortMergeJoin operator [datafusion]

2026-03-09 Thread via GitHub
mbutrovich commented on code in PR #20806: URL: https://github.com/apache/datafusion/pull/20806#discussion_r2906925439 ## datafusion/physical-plan/src/joins/semi_anti_sort_merge_join/stream.rs: ## @@ -0,0 +1,1160 @@ +// Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-09 Thread via GitHub
alamb commented on code in PR #20820: URL: https://github.com/apache/datafusion/pull/20820#discussion_r2906927341 ## datafusion/datasource/src/morsel.rs: ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements

Re: [PR] Rewrite `SUM(expr + scalar)` --> `SUM(expr) + scalar*COUNT(expr)` [datafusion]

2026-03-09 Thread via GitHub
adriangb commented on code in PR #20749: URL: https://github.com/apache/datafusion/pull/20749#discussion_r2898345002 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -3490,7 +3490,9 @@ pub struct Aggregate { pub input: Arc, /// Grouping expressions pub group_exp

[PR] Spark dayname function implementation [datafusion]

2026-03-09 Thread via GitHub
kazantsev-maksim opened a new pull request, #20825: URL: https://github.com/apache/datafusion/pull/20825 ## Which issue does this PR close? N/A ## Rationale for this change Add new spark function: https://spark.apache.org/docs/latest/api/sql/index.html#dayname ##

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-09 Thread via GitHub
alamb commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4025539235 What I am planning to do next in this PR (after I finish up some other things for work) is 1. Integrate in "morsels" into the Parquet opener flow (as a proof of concept) 2. Sketc

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-09 Thread via GitHub
Dandandan commented on code in PR #20820: URL: https://github.com/apache/datafusion/pull/20820#discussion_r2907084995 ## datafusion/datasource/src/morsel.rs: ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-09 Thread via GitHub
Dandandan commented on code in PR #20820: URL: https://github.com/apache/datafusion/pull/20820#discussion_r2907084995 ## datafusion/datasource/src/morsel.rs: ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

Re: [PR] fix: Prefer numeric in type coercion for comparisons [datafusion]

2026-03-09 Thread via GitHub
adriangb commented on PR #20426: URL: https://github.com/apache/datafusion/pull/20426#issuecomment-4026458043 > I'm not sure if i can give meaningful contribution in this case, ballista is relaying on datafusion for this behaviour. > > would we need to capture change of behaviour some

Re: [PR] feat: Reduce allocations for aggregating `Statistics` [datafusion]

2026-03-09 Thread via GitHub
jonathanc-n commented on code in PR #20768: URL: https://github.com/apache/datafusion/pull/20768#discussion_r2907643585 ## datafusion/common/src/utils/aggregate.rs: ## @@ -0,0 +1,117 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] perf: specialized SemiAntiSortMergeJoin operator [datafusion]

2026-03-09 Thread via GitHub
mbutrovich commented on code in PR #20806: URL: https://github.com/apache/datafusion/pull/20806#discussion_r2907668997 ## datafusion/physical-plan/src/joins/semi_anti_sort_merge_join/stream.rs: ## @@ -0,0 +1,1218 @@ +// Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] fix: Prefer numeric in type coercion for comparisons [datafusion]

2026-03-09 Thread via GitHub
milenkovicm commented on PR #20426: URL: https://github.com/apache/datafusion/pull/20426#issuecomment-4026563388 Maybe sending an email to dev list for visibility would make sense, would it? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Support '>', '<', '>=', '<=', '<>' in any operator [datafusion]

2026-03-09 Thread via GitHub
adriangb commented on code in PR #20830: URL: https://github.com/apache/datafusion/pull/20830#discussion_r2907975070 ## datafusion/sql/src/expr/mod.rs: ## @@ -612,16 +612,24 @@ impl SqlToRel<'_, S> { planner_context, ), _ =>

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-03-09 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4026969643 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] Support '>', '<', '>=', '<=', '<>' in any operator [datafusion]

2026-03-09 Thread via GitHub
adriangb commented on code in PR #20830: URL: https://github.com/apache/datafusion/pull/20830#discussion_r2907975070 ## datafusion/sql/src/expr/mod.rs: ## @@ -612,16 +612,24 @@ impl SqlToRel<'_, S> { planner_context, ), _ =>

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-03-09 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4026969381 🤖: Benchmark completed Details ``` Comparing HEAD and parquet-morsel-driven-execution-237164415184908839 Benchmark clickbench_pa

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-03-09 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4026973928 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-03-09 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4026973719 🤖: Benchmark completed Details ``` Comparing HEAD and parquet-morsel-driven-execution-237164415184908839 Benchmark tpch_sf1.json

[PR] feat: enable debug assertions in CI profile and RUST_BACKTRACE [datafusion-comet]

2026-03-09 Thread via GitHub
andygrove opened a new pull request, #3652: URL: https://github.com/apache/datafusion-comet/pull/3652 ## Summary - Enable `debug-assertions = true` in the CI cargo profile to catch potential issues with unsafe code at test time - Enable `RUST_BACKTRACE=1` in all CI workflows to get ful

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-09 Thread via GitHub
Dandandan commented on code in PR #20820: URL: https://github.com/apache/datafusion/pull/20820#discussion_r2908022899 ## datafusion/datasource/src/morsel.rs: ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

[PR] sql: avoid duplicate projection-name errors in set-op operands [datafusion]

2026-03-09 Thread via GitHub
SergioChan opened a new pull request, #20833: URL: https://github.com/apache/datafusion/pull/20833 ## Summary - fix planning for set-operation operands (`UNION` / `EXCEPT` / `INTERSECT`) when a non-left operand has repeated unnamed projection literals - on duplicate projection-name pla

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-09 Thread via GitHub
Dandandan commented on code in PR #20820: URL: https://github.com/apache/datafusion/pull/20820#discussion_r2908022899 ## datafusion/datasource/src/morsel.rs: ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

Re: [PR] Allow filters on struct fields to be pushed down into Parquet scan [datafusion]

2026-03-09 Thread via GitHub
cetra3 commented on code in PR #20822: URL: https://github.com/apache/datafusion/pull/20822#discussion_r2908036318 ## datafusion/datasource-parquet/src/row_filter.rs: ## @@ -294,6 +308,42 @@ impl<'schema> PushdownChecker<'schema> { } } +/// Checks whether a s

Re: [PR] Allow filters on struct fields to be pushed down into Parquet scan [datafusion]

2026-03-09 Thread via GitHub
cetra3 commented on code in PR #20822: URL: https://github.com/apache/datafusion/pull/20822#discussion_r2908034845 ## datafusion/datasource-parquet/src/row_filter.rs: ## @@ -368,6 +418,47 @@ impl TreeNodeVisitor<'_> for PushdownChecker<'_> { type Node = Arc; fn f_dow

Re: [PR] Enable debug assertions in CI. [datafusion]

2026-03-09 Thread via GitHub
stuhood commented on PR #20832: URL: https://github.com/apache/datafusion/pull/20832#issuecomment-4027029835 This did not fail because `sqllogictest-sqlite` did not run for some reason: likely because `.github/workflows/labeler/labeler-config.yml` will not match for the edits that have been

Re: [PR] Enable debug assertions in CI. [datafusion]

2026-03-09 Thread via GitHub
mbutrovich commented on PR #20832: URL: https://github.com/apache/datafusion/pull/20832#issuecomment-4027020228 +1 from me, I just raised this issue with @andygrove on Comet today when we are adding `debug_assert` around some code and I commented that "we don't run that in CI anyway."

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-09 Thread via GitHub
Dandandan commented on code in PR #20820: URL: https://github.com/apache/datafusion/pull/20820#discussion_r2908038777 ## datafusion/datasource/src/morsel.rs: ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-09 Thread via GitHub
Dandandan commented on code in PR #20820: URL: https://github.com/apache/datafusion/pull/20820#discussion_r2908038777 ## datafusion/datasource/src/morsel.rs: ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-03-09 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4027038769 🤖: Benchmark completed Details ``` Comparing HEAD and parquet-morsel-driven-execution-237164415184908839 Benchmark tpcds_sf1.jso

Re: [PR] Enable debug assertions in CI. [datafusion]

2026-03-09 Thread via GitHub
stuhood commented on PR #20832: URL: https://github.com/apache/datafusion/pull/20832#issuecomment-4027048631 > It should already be enabled for `[profile.ci]` though, right? I don't mind it being explicit, just curious. Yea, it looks like you are right: https://doc.rust-lang.org/carg

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-09 Thread via GitHub
Dandandan commented on code in PR #20820: URL: https://github.com/apache/datafusion/pull/20820#discussion_r2908022899 ## datafusion/datasource/src/morsel.rs: ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

Re: [PR] perf: specialized SemiAntiSortMergeJoin operator [datafusion]

2026-03-09 Thread via GitHub
comphead commented on PR #20806: URL: https://github.com/apache/datafusion/pull/20806#issuecomment-4025063993 One thing to mention: to control the flag `datafusion.optimizer.enable_semi_anti_sort_merge_join` or `DATAFUSION_OPTIMIZER_ENABLE_SEMI_ANTI_SORT_MERGE_JOIN` in `bench.sh`, at least

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-09 Thread via GitHub
Dandandan commented on code in PR #20820: URL: https://github.com/apache/datafusion/pull/20820#discussion_r2906896138 ## datafusion/datasource/src/morsel.rs: ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

Re: [PR] Allow filters on struct fields to be pushed down into Parquet scan [datafusion]

2026-03-09 Thread via GitHub
adriangb commented on PR #20822: URL: https://github.com/apache/datafusion/pull/20822#issuecomment-4025483431 Let's leave this open for a day to collect feedback 😃 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] Release DataFusion `53.0.0` (Feb 2026 / Mar 2026) [datafusion]

2026-03-09 Thread via GitHub
milenkovicm commented on issue #19692: URL: https://github.com/apache/datafusion/issues/19692#issuecomment-4025493986 @benbellick is investigating at the moment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] perf: specialized SemiAntiSortMergeJoin operator [datafusion]

2026-03-09 Thread via GitHub
mbutrovich commented on code in PR #20806: URL: https://github.com/apache/datafusion/pull/20806#discussion_r2907131246 ## datafusion/physical-plan/src/joins/semi_anti_sort_merge_join/stream.rs: ## @@ -0,0 +1,1160 @@ +// Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] perf: specialized SemiAntiSortMergeJoin operator [datafusion]

2026-03-09 Thread via GitHub
mbutrovich commented on code in PR #20806: URL: https://github.com/apache/datafusion/pull/20806#discussion_r2907131246 ## datafusion/physical-plan/src/joins/semi_anti_sort_merge_join/stream.rs: ## @@ -0,0 +1,1160 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[PR] chore(deps): pin substrait to `0.62.2` [datafusion]

2026-03-09 Thread via GitHub
milenkovicm opened a new pull request, #20827: URL: https://github.com/apache/datafusion/pull/20827 ## Which issue does this PR close? Relates to #20785 ## Rationale for this change substrait has released patch release 0.62.3 which is backward incompatible, which breaks

Re: [I] Release DataFusion `53.0.0` (Feb 2026 / Mar 2026) [datafusion]

2026-03-09 Thread via GitHub
milenkovicm commented on issue #19692: URL: https://github.com/apache/datafusion/issues/19692#issuecomment-4026230209 created #20827 with pinned version -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-03-09 Thread via GitHub
Dandandan commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4026228743 run benchmark clickbench_partitioned tpch tpcds DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true -- This is a

Re: [I] Support ANY operator [datafusion]

2026-03-09 Thread via GitHub
buraksenn commented on issue #2548: URL: https://github.com/apache/datafusion/issues/2548#issuecomment-4026245610 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[PR] Add benchmark for struct field filter pushdown in Parquet [datafusion]

2026-03-09 Thread via GitHub
friendlymatthew opened a new pull request, #20829: URL: https://github.com/apache/datafusion/pull/20829 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/20828 ## Rationale for this change This PR adds a series of benchmarks that comp

Re: [PR] fix/substrait-lambda-rextype-0.62.3+ [datafusion]

2026-03-09 Thread via GitHub
milenkovicm commented on PR #20785: URL: https://github.com/apache/datafusion/pull/20785#issuecomment-4026243931 I have pinned version of substrait for df.53 in #20827 if no better suggestion -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [I] Support ALL operator [datafusion]

2026-03-09 Thread via GitHub
buraksenn commented on issue #2547: URL: https://github.com/apache/datafusion/issues/2547#issuecomment-4026245096 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[I] Add benchmarks for struct field filter pushdown in Parquet [datafusion]

2026-03-09 Thread via GitHub
friendlymatthew opened a new issue, #20828: URL: https://github.com/apache/datafusion/issues/20828 https://github.com/apache/datafusion/pull/20822 allows us to push struct field filters down to the parquet source. It would be good to have a series of benchmarks so we can measure the impact

Re: [PR] Rewrite `SUM(expr + scalar)` --> `SUM(expr) + scalar*COUNT(expr)` [datafusion]

2026-03-09 Thread via GitHub
alamb commented on PR #20749: URL: https://github.com/apache/datafusion/pull/20749#issuecomment-4026309324 > I'm concerned that we start to have multiple optimizers that have to coordinate (namely CSE, the expression pushdown for get_field and this) but I don't have any suggestions for a be

Re: [PR] fix: Prefer numeric in type coercion for comparisons [datafusion]

2026-03-09 Thread via GitHub
milenkovicm commented on PR #20426: URL: https://github.com/apache/datafusion/pull/20426#issuecomment-4026321284 I'm not sure if i can give meaningful contribution in this case, ballista is relaying on datafusion for this behaviour. would we need to capture change of behaviour somewh

Re: [PR] Rewrite `SUM(expr + scalar)` --> `SUM(expr) + scalar*COUNT(expr)` [datafusion]

2026-03-09 Thread via GitHub
alamb commented on code in PR #20749: URL: https://github.com/apache/datafusion/pull/20749#discussion_r2907571484 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -3490,7 +3490,9 @@ pub struct Aggregate { pub input: Arc, /// Grouping expressions pub group_expr:

Re: [PR] perf: specialized SemiAntiSortMergeJoin operator [datafusion]

2026-03-09 Thread via GitHub
rluvaton commented on code in PR #20806: URL: https://github.com/apache/datafusion/pull/20806#discussion_r2907574950 ## datafusion/physical-plan/src/joins/semi_anti_sort_merge_join/stream.rs: ## @@ -0,0 +1,1218 @@ +// Licensed to the Apache Software Foundation (ASF) under one +/

Re: [PR] Make Clickbench Q29 5x faster for datafusion [datafusion]

2026-03-09 Thread via GitHub
alamb commented on PR #15532: URL: https://github.com/apache/datafusion/pull/15532#issuecomment-4026332826 Here is a new proposal: - https://github.com/apache/datafusion/pull/20749 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] feat : support spark compatible int to timestamp cast [datafusion]

2026-03-09 Thread via GitHub
coderfender commented on code in PR #20555: URL: https://github.com/apache/datafusion/pull/20555#discussion_r2907577014 ## datafusion/spark/src/function/conversion/cast.rs: ## @@ -0,0 +1,633 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

[PR] Support '>', '<', '>=', '<=' in any operator [datafusion]

2026-03-09 Thread via GitHub
buraksenn opened a new pull request, #20830: URL: https://github.com/apache/datafusion/pull/20830 ## Which issue does this PR close? - Closes #2548. ## Rationale for this change ANY operator only supports equality check ## What changes are included in this PR? Adds supp

Re: [PR] Fixed CHANGES keyword parsing for snowflake [datafusion-sqlparser-rs]

2026-03-09 Thread via GitHub
romanoff commented on PR #2266: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2266#issuecomment-4027676269 @iffyio Updated. Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] test: DataFusion PR #20806 (specialized SemiAntiSortMergeJoin operator) [datafusion-comet]

2026-03-09 Thread via GitHub
mbutrovich closed pull request #3648: test: DataFusion PR #20806 (specialized SemiAntiSortMergeJoin operator) URL: https://github.com/apache/datafusion-comet/pull/3648 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Fixed stage name parsing for snowflake [datafusion-sqlparser-rs]

2026-03-09 Thread via GitHub
romanoff commented on PR #2265: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2265#issuecomment-4027686256 @iffyio Updated. Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] feat: Enable native c2r by default, add debug asserts [datafusion-comet]

2026-03-09 Thread via GitHub
comphead commented on code in PR #3649: URL: https://github.com/apache/datafusion-comet/pull/3649#discussion_r2908545294 ## native/Cargo.toml: ## @@ -71,4 +71,5 @@ strip = "debuginfo" inherits = "release" lto = false # Skip LTO for faster linking codegen-units = 16

Re: [PR] Add clickbench parquet based queries to sql_planner benchmark [datafusion]

2026-03-09 Thread via GitHub
adriangb commented on code in PR #13103: URL: https://github.com/apache/datafusion/pull/13103#discussion_r2908543586 ## datafusion/core/benches/sql_planner.rs: ## @@ -258,20 +303,25 @@ fn criterion_benchmark(c: &mut Criterion) { }) }); -c.bench_function("logi

Re: [PR] feat: Enable native c2r by default, add debug asserts [datafusion-comet]

2026-03-09 Thread via GitHub
andygrove commented on PR #3649: URL: https://github.com/apache/datafusion-comet/pull/3649#issuecomment-4027713651 test failure is an existing issue in main branch and is fixed in https://github.com/apache/datafusion-comet/pull/3652. I will rebase this PR once that is merged -- Th

Re: [PR] feat: enable debug assertions in CI profile, fix unaligned memory access bug [datafusion-comet]

2026-03-09 Thread via GitHub
andygrove commented on PR #3652: URL: https://github.com/apache/datafusion-comet/pull/3652#issuecomment-4027715211 @mbutrovich I think this explains the bus error you saw recently -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Fix STORAGE LIFECYCLE POLICY for snowflake queries [datafusion-sqlparser-rs]

2026-03-09 Thread via GitHub
romanoff commented on PR #2264: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2264#issuecomment-4027720508 @iffyio Updated. Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] feat: Enable native c2r by default, add debug asserts [datafusion-comet]

2026-03-09 Thread via GitHub
comphead commented on PR #3649: URL: https://github.com/apache/datafusion-comet/pull/3649#issuecomment-4027725410 > test failure is an existing issue in main branch and is fixed in #3652. > > I will rebase this PR once that is merged The tests passed on CI, it says `sbt.ForkMai

Re: [PR] feat: Enable native c2r by default, add debug asserts [datafusion-comet]

2026-03-09 Thread via GitHub
andygrove commented on PR #3649: URL: https://github.com/apache/datafusion-comet/pull/3649#issuecomment-4027728445 > > test failure is an existing issue in main branch and is fixed in #3652. > > I will rebase this PR once that is merged > > The tests passed on CI, it says `sbt.Fork

Re: [PR] feat : support spark compatible int to timestamp cast [datafusion]

2026-03-09 Thread via GitHub
coderfender commented on PR #20555: URL: https://github.com/apache/datafusion/pull/20555#issuecomment-4027732114 @martin-g , requesting review to port ceil function from comet to spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] feat: spark compatible ceil function [datafusion]

2026-03-09 Thread via GitHub
coderfender commented on PR #20703: URL: https://github.com/apache/datafusion/pull/20703#issuecomment-4027732995 @martin-g , requesting review to port ceil function from comet to spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Fixed CHANGES keyword parsing for snowflake [datafusion-sqlparser-rs]

2026-03-09 Thread via GitHub
romanoff commented on PR #2266: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2266#issuecomment-4027737863 Had to add https://github.com/apache/datafusion-sqlparser-rs/pull/2266/changes#diff-80f1d10500bf0503869a8a33086830849ab5455192c0269291eefb70d11911e3L3211-R3214 due to fa

Re: [PR] feat: Enable native c2r by default, add debug asserts [datafusion-comet]

2026-03-09 Thread via GitHub
andygrove commented on PR #3649: URL: https://github.com/apache/datafusion-comet/pull/3649#issuecomment-4027742337 > > > > test failure is an existing issue in main branch and is fixed in #3652. > > > > I will rebase this PR once that is merged > > > > > > > > > The tests pass

Re: [PR] feat: Enable native c2r by default, add debug asserts [datafusion-comet]

2026-03-09 Thread via GitHub
andygrove commented on PR #3649: URL: https://github.com/apache/datafusion-comet/pull/3649#issuecomment-4027740564 > > > test failure is an existing issue in main branch and is fixed in #3652. > > > I will rebase this PR once that is merged > > > > > > The tests passed on CI, i

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-03-09 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4025092875 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] fix: Prefer numeric in type coercion for comparisons [datafusion]

2026-03-09 Thread via GitHub
adriangb commented on code in PR #20426: URL: https://github.com/apache/datafusion/pull/20426#discussion_r2906316372 ## datafusion/core/src/physical_planner.rs: ## @@ -3436,16 +3436,16 @@ mod tests { #[tokio::test] async fn in_list_types() -> Result<()> { -//

Re: [I] Release DataFusion `53.0.0` (Feb 2026 / Mar 2026) [datafusion]

2026-03-09 Thread via GitHub
comphead commented on issue #19692: URL: https://github.com/apache/datafusion/issues/19692#issuecomment-4025076907 > maybe we should backport [#20785](https://github.com/apache/datafusion/pull/20785) to 53 release. at the moment build with 52.2 will break with: > wdyt [@comphead](https:/

Re: [PR] perf: optimize scatter with type-specific specialization [datafusion]

2026-03-09 Thread via GitHub
CuteChuanChuan commented on PR #20498: URL: https://github.com/apache/datafusion/pull/20498#issuecomment-4025127060 Addressed review comments: - Added all-null mask fast path - Replaced ArrayDataBuilder with safe constructors (PrimitiveArray, BooleanArray, FixedSizeBinaryArray) and add

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-03-09 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4025330608 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [I] Release DataFusion `53.0.0` (Feb 2026 / Mar 2026) [datafusion]

2026-03-09 Thread via GitHub
comphead commented on issue #19692: URL: https://github.com/apache/datafusion/issues/19692#issuecomment-4025630574 > i would suggest to pin substrait version to previous for 53 release, instead of merging any of them at this point if you agree Ic, feel free to create a PR to `branch-5

Re: [I] Latest substrait patch release does not compile against DF [datafusion]

2026-03-09 Thread via GitHub
benbellick commented on issue #20756: URL: https://github.com/apache/datafusion/issues/20756#issuecomment-4025657071 [Copying my comment from here](https://github.com/apache/datafusion/pull/20785#issuecomment-4025653255): > This release of substrait-rs breaks semvar compatibility. This wa

Re: [PR] fix/substrait-lambda-rextype-0.62.3+ [datafusion]

2026-03-09 Thread via GitHub
benbellick commented on PR #20785: URL: https://github.com/apache/datafusion/pull/20785#issuecomment-4025653255 This release of `substrait-rs` breaks semvar compatibility. This was an oversight on my end due to the fact that `release-plz` didn't detect a submodule bump as a breaking change.

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-09 Thread via GitHub
Dandandan commented on code in PR #20820: URL: https://github.com/apache/datafusion/pull/20820#discussion_r2907199070 ## datafusion/datasource/src/morsel.rs: ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-03-09 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4025876117 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-03-09 Thread via GitHub
Dandandan commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4025875574 run benchmark tpcds tpch clickbench_partitioned clickbench_extended -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] fix: Prefer numeric in type coercion for comparisons [datafusion]

2026-03-09 Thread via GitHub
martin-g commented on code in PR #20426: URL: https://github.com/apache/datafusion/pull/20426#discussion_r2908005743 ## datafusion/expr-common/src/signature.rs: ## @@ -158,7 +158,7 @@ pub enum Arity { pub enum TypeSignature { /// One or more arguments of a common type out

Re: [PR] feat : support spark compatible int to timestamp cast [datafusion]

2026-03-09 Thread via GitHub
coderfender commented on code in PR #20555: URL: https://github.com/apache/datafusion/pull/20555#discussion_r2908195575 ## datafusion/spark/src/function/conversion/cast.rs: ## @@ -0,0 +1,633 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

Re: [PR] [RFC] Add lambda support and array_transform udf [datafusion]

2026-03-09 Thread via GitHub
LiaCastaneda commented on code in PR #18921: URL: https://github.com/apache/datafusion/pull/18921#discussion_r2908204189 ## DOC.md: ## @@ -0,0 +1,1166 @@ +This PR adds support for lambdas with column capture and the `array_transform` function used to test the lambda implementat

Re: [PR] fix: Prefer numeric in type coercion for comparisons [datafusion]

2026-03-09 Thread via GitHub
neilconway commented on code in PR #20426: URL: https://github.com/apache/datafusion/pull/20426#discussion_r2908222551 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -843,102 +842,100 @@ pub fn try_type_union_resolution_with_struct( Ok(final_struct_types) }

Re: [PR] perf: specialized SemiAntiSortMergeJoin operator [datafusion]

2026-03-09 Thread via GitHub
Dandandan commented on PR #20806: URL: https://github.com/apache/datafusion/pull/20806#issuecomment-4027275844 run benchmark tpch DATAFUSION_OPTIMIZER_PREFER_HASH_JOIN=false -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] perf: specialized SemiAntiSortMergeJoin operator [datafusion]

2026-03-09 Thread via GitHub
alamb-ghbot commented on PR #20806: URL: https://github.com/apache/datafusion/pull/20806#issuecomment-4027276009 🤖 Hi @Dandandan, thanks for the request (https://github.com/apache/datafusion/pull/20806#issuecomment-4027275844). [`scrape_comments.py`](https://github.com/alamb/datafusi

Re: [PR] feat: Enable native c2r by default, add debug asserts [datafusion-comet]

2026-03-09 Thread via GitHub
andygrove commented on PR #3649: URL: https://github.com/apache/datafusion-comet/pull/3649#issuecomment-4024881013 moving to draft while I run new benchmarks to confirm the benefit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] [REST]: Cancelling a completed/failed job should return "false" [datafusion-ballista]

2026-03-09 Thread via GitHub
milenkovicm commented on PR #1494: URL: https://github.com/apache/datafusion-ballista/pull/1494#issuecomment-4024885809 please merge if you dont want to add anything else -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] feat: Enable native c2r by default, add debug asserts [datafusion-comet]

2026-03-09 Thread via GitHub
andygrove commented on code in PR #3649: URL: https://github.com/apache/datafusion-comet/pull/3649#discussion_r2906413832 ## native/Cargo.toml: ## @@ -71,4 +71,5 @@ strip = "debuginfo" inherits = "release" lto = false # Skip LTO for faster linking codegen-units = 16

Re: [PR] Cherry pick fixes from 46 [datafusion]

2026-03-09 Thread via GitHub
avantgardnerio closed pull request #20824: Cherry pick fixes from 46 URL: https://github.com/apache/datafusion/pull/20824 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-03-09 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4024949804 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] Cherry pick fixes from 46 [datafusion]

2026-03-09 Thread via GitHub
avantgardnerio commented on PR #20824: URL: https://github.com/apache/datafusion/pull/20824#issuecomment-4024953460 Wrong repo again, sorry. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] Cherry pick fixes from 46 [datafusion]

2026-03-09 Thread via GitHub
avantgardnerio opened a new pull request, #20824: URL: https://github.com/apache/datafusion/pull/20824 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-03-09 Thread via GitHub
Dandandan commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4024948431 run benchmark tpcds tpch clickbench_partitioned clickbench_extended -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Allow filters on struct fields to be pushed down into Parquet scan [datafusion]

2026-03-09 Thread via GitHub
adriangb commented on code in PR #20822: URL: https://github.com/apache/datafusion/pull/20822#discussion_r2906480912 ## datafusion/datasource-parquet/src/row_filter.rs: ## @@ -368,6 +399,31 @@ impl TreeNodeVisitor<'_> for PushdownChecker<'_> { type Node = Arc; fn f_d

Re: [PR] perf: specialized SemiAntiSortMergeJoin operator [datafusion]

2026-03-09 Thread via GitHub
comphead commented on PR #20806: URL: https://github.com/apache/datafusion/pull/20806#issuecomment-4025013189 Thanks @mbutrovich, IMO it is great to start replacing current SMJ implementation with more performant and easier to maintain structure. Even by pieces. Good sign fuzz tests passed.

Re: [PR] perf: specialized SemiAntiSortMergeJoin operator [datafusion]

2026-03-09 Thread via GitHub
mbutrovich commented on code in PR #20806: URL: https://github.com/apache/datafusion/pull/20806#discussion_r2906537029 ## benchmarks/src/smj.rs: ## @@ -277,6 +277,7 @@ const SMJ_QUERIES: &[&str] = &[ WHERE EXISTS ( SELECT 1 FROM t2_sorted WHERE

Re: [PR] Rewrite `SUM(expr + scalar)` --> `SUM(expr) + scalar*COUNT(expr)` [datafusion]

2026-03-09 Thread via GitHub
alamb commented on PR #20749: URL: https://github.com/apache/datafusion/pull/20749#issuecomment-4026851535 FWIW this PR doesn't add a new optimizer rule, it extends an existing one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Clean up date_part preimage implementation [datafusion]

2026-03-09 Thread via GitHub
alamb commented on PR #20350: URL: https://github.com/apache/datafusion/pull/20350#issuecomment-4026842812 Thanks again @sdf-jkl -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Rewrite `SUM(expr + scalar)` --> `SUM(expr) + scalar*COUNT(expr)` [datafusion]

2026-03-09 Thread via GitHub
alamb commented on code in PR #20749: URL: https://github.com/apache/datafusion/pull/20749#discussion_r2907885397 ## datafusion/expr/src/udaf.rs: ## @@ -691,6 +707,74 @@ pub trait AggregateUDFImpl: Debug + DynEq + DynHash + Send + Sync { None } +/// Rewrite

Re: [PR] Rewrite `SUM(expr + scalar)` --> `SUM(expr) + scalar*COUNT(expr)` [datafusion]

2026-03-09 Thread via GitHub
alamb commented on PR #20749: URL: https://github.com/apache/datafusion/pull/20749#issuecomment-4026897337 > > I'm concerned that we start to have multiple optimizers that have to coordinate (namely CSE, the expression pushdown for get_field and this) but I don't have any suggestions for a

Re: [PR] Fix decimal log precision for non-power values [datafusion]

2026-03-09 Thread via GitHub
martin-g commented on code in PR #20433: URL: https://github.com/apache/datafusion/pull/20433#discussion_r2907911166 ## datafusion/functions/src/math/log.rs: ## Review Comment: ```suggestion ``` remove some debug leftover -- This is an automated message from the A

  1   2   3   4   >