Re: [PR] chore: Format examples in doc strings - spark, sql, sqllogictest, sibstrait [datafusion]

2025-11-07 Thread via GitHub
CuteChuanChuan commented on PR #18443: URL: https://github.com/apache/datafusion/pull/18443#issuecomment-3505579614 @alamb Thanks! ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Enable dynamic filter pushdown for LEFT/RIGHT/SEMI/ANTI/Mark joins; surface probe metadata in plans; add join-preservation docs [datafusion]

2025-11-07 Thread via GitHub
github-actions[bot] commented on PR #17090: URL: https://github.com/apache/datafusion/pull/17090#issuecomment-3505596699 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] feature: sort by/cluster by/distribute by [datafusion]

2025-11-07 Thread via GitHub
github-actions[bot] commented on PR #16310: URL: https://github.com/apache/datafusion/pull/16310#issuecomment-3505596814 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] feature: sort by/cluster by/distribute by [datafusion]

2025-11-07 Thread via GitHub
chenkovsky commented on PR #16310: URL: https://github.com/apache/datafusion/pull/16310#issuecomment-3505607880 keep it open -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: added clippy::needless_pass_by_value lint rule to datafusion/expr [datafusion]

2025-11-07 Thread via GitHub
2010YOUY01 commented on code in PR #18532: URL: https://github.com/apache/datafusion/pull/18532#discussion_r2505973071 ## datafusion/expr/src/lib.rs: ## @@ -24,6 +24,12 @@ // https://github.com/apache/datafusion/issues/11143 #![deny(clippy::clone_on_ref_ptr)] +// https://git

Re: [PR] minor: Remove inconsistent comment [datafusion]

2025-11-07 Thread via GitHub
2010YOUY01 commented on code in PR #18539: URL: https://github.com/apache/datafusion/pull/18539#discussion_r2505977426 ## datafusion/common/src/lib.rs: ## @@ -25,8 +25,6 @@ #![deny(clippy::clone_on_ref_ptr)] // https://github.com/apache/datafusion/issues/18503 #![deny(clippy:

[PR] minor: Remove inconsistent comment [datafusion]

2025-11-07 Thread via GitHub
2010YOUY01 opened a new pull request, #18539: URL: https://github.com/apache/datafusion/pull/18539 ## Which issue does this PR close? - Closes #. ## Rationale for this change In https://github.com/apache/datafusion/pull/18468, there is a inconsistent comment

Re: [PR] ci: enforce needless_pass_by_value for datafusion-optimzer [datafusion]

2025-11-07 Thread via GitHub
2010YOUY01 merged PR #18533: URL: https://github.com/apache/datafusion/pull/18533 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Enforce lint rule `clippy::needless_pass_by_value` to `datafusion-optimizer` [datafusion]

2025-11-07 Thread via GitHub
2010YOUY01 closed issue #18505: Enforce lint rule `clippy::needless_pass_by_value` to `datafusion-optimizer` URL: https://github.com/apache/datafusion/issues/18505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] feature: sort by/cluster by/distribute by [datafusion]

2025-11-07 Thread via GitHub
mbutrovich commented on PR #16310: URL: https://github.com/apache/datafusion/pull/16310#issuecomment-3505616749 I’ll take a look next week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] WIP: Pin to Adaptive Parquet Predicate Pushdown [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18368: URL: https://github.com/apache/datafusion/pull/18368#issuecomment-3505527818 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_test_adapative_scan Benchmark clickbench_extended.json

Re: [PR] chore: Fallback to Spark for windows functions [datafusion-comet]

2025-11-07 Thread via GitHub
andygrove commented on code in PR #2726: URL: https://github.com/apache/datafusion-comet/pull/2726#discussion_r2505938148 ## spark/src/test/scala/org/apache/comet/exec/CometWindowExecSuite.scala: ## @@ -39,12 +41,86 @@ class CometWindowExecSuite extends CometTestBase { supe

Re: [PR] WIP: Pin to Adaptive Parquet Predicate Pushdown [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18368: URL: https://github.com/apache/datafusion/pull/18368#issuecomment-3505527938 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubun

Re: [PR] WIP: Pin to Adaptive Parquet Predicate Pushdown [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18368: URL: https://github.com/apache/datafusion/pull/18368#issuecomment-3505546651 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_test_adapative_scan Benchmark clickbench_pushdown.json

Re: [PR] WIP: Pin to Adaptive Parquet Predicate Pushdown [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18368: URL: https://github.com/apache/datafusion/pull/18368#issuecomment-3505546690 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubun

Re: [PR] #17801 Improve nullability reporting of case expressions [datafusion]

2025-11-07 Thread via GitHub
pepijnve commented on code in PR #17813: URL: https://github.com/apache/datafusion/pull/17813#discussion_r2505950506 ## datafusion/expr/src/predicate_eval.rs: ## @@ -0,0 +1,727 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] Add comments to Cargo.toml about workspace overrides [datafusion]

2025-11-07 Thread via GitHub
2010YOUY01 merged PR #18526: URL: https://github.com/apache/datafusion/pull/18526 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] fix: Eliminate consecutive repartitions [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18521: URL: https://github.com/apache/datafusion/pull/18521#issuecomment-3505656599 🤖: Benchmark completed Details ``` Comparing HEAD and gene.bordegaray_2025_10_avoid_consecutive_repartition_exec Benchmark clickbench_

Re: [PR] WIP: Pin to Adaptive Parquet Predicate Pushdown [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18368: URL: https://github.com/apache/datafusion/pull/18368#issuecomment-3505577091 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_test_adapative_scan Benchmark clickbench_extended.json

Re: [PR] fix: Eliminate consecutive repartitions [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18521: URL: https://github.com/apache/datafusion/pull/18521#issuecomment-3505577154 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubun

[PR] refactor: merge CoalesceAsyncExecInput into CoalesceBatches [datafusion]

2025-11-07 Thread via GitHub
Tim-53 opened a new pull request, #18540: URL: https://github.com/apache/datafusion/pull/18540 ## Which issue does this PR close? - Closes #18155. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [PR] chore: Fallback to Spark for windows functions [datafusion-comet]

2025-11-07 Thread via GitHub
comphead commented on code in PR #2726: URL: https://github.com/apache/datafusion-comet/pull/2726#discussion_r2506199769 ## spark/src/test/scala/org/apache/comet/exec/CometWindowExecSuite.scala: ## @@ -39,12 +41,86 @@ class CometWindowExecSuite extends CometTestBase { super

Re: [I] Consider folding `CoalesceAsyncExecInput` physical optimizer rule into `CoalesceBatches` [datafusion]

2025-11-07 Thread via GitHub
Tim-53 commented on issue #18155: URL: https://github.com/apache/datafusion/issues/18155#issuecomment-3505903173 I noticed there hasn’t been any recent activity on this issue, so I went ahead and opened PR #18540 that folds CoalesceAsyncExecInput into CoalesceBatches as proposed. -- Thi

Re: [PR] chore: Fallback to Spark for windows functions [datafusion-comet]

2025-11-07 Thread via GitHub
comphead commented on code in PR #2726: URL: https://github.com/apache/datafusion-comet/pull/2726#discussion_r2506240503 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -251,7 +251,7 @@ object CometConf extends ShimCometConf { val COMET_EXEC_EXPAND_ENABLED: C

[PR] chore: Format examples in doc strings - follow-up for main branch fe2469 [datafusion]

2025-11-07 Thread via GitHub
CuteChuanChuan opened a new pull request, #18541: URL: https://github.com/apache/datafusion/pull/18541 ## Which issue does this PR close? Part of #16915 ## Rationale for this change Format code examples in documentation comments to improve readability and maintain consistent cod

[PR] ci: add check for doc comment formatting [datafusion]

2025-11-07 Thread via GitHub
CuteChuanChuan opened a new pull request, #18542: URL: https://github.com/apache/datafusion/pull/18542 ## Which issue does this PR close? Closes #16915. ## Rationale for this change This PR adds CI enforcement to ensure all code examples in documentation comments are pro

Re: [PR] minor: Remove inconsistent comment [datafusion]

2025-11-07 Thread via GitHub
2010YOUY01 merged PR #18539: URL: https://github.com/apache/datafusion/pull/18539 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] chore: Fallback to Spark for windows functions [datafusion-comet]

2025-11-07 Thread via GitHub
comphead commented on code in PR #2726: URL: https://github.com/apache/datafusion-comet/pull/2726#discussion_r2506240503 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -251,7 +251,7 @@ object CometConf extends ShimCometConf { val COMET_EXEC_EXPAND_ENABLED: C

Re: [PR] Refactor `log()` signature to use coercion API + fixes [datafusion]

2025-11-07 Thread via GitHub
Jefffrey merged PR #18519: URL: https://github.com/apache/datafusion/pull/18519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Consolidate builtin functions examples (#18142) [datafusion]

2025-11-07 Thread via GitHub
cj-zhukov commented on code in PR #18523: URL: https://github.com/apache/datafusion/pull/18523#discussion_r2506280195 ## datafusion-examples/examples/builtin_functions/date_time.rs: ## @@ -26,8 +26,7 @@ use datafusion::common::assert_contains; use datafusion::error::Result; us

Re: [PR] We have now the CI ensure all doc strings remain formatted [datafusion]

2025-11-07 Thread via GitHub
CuteChuanChuan commented on PR #16916: URL: https://github.com/apache/datafusion/pull/16916#issuecomment-3506018859 Hi @alamb , When working on enabling CI check to ensure consistent formatting, I found there is a conflict. The result of `cargo fmt` cannot pass this command `cargo +n

Re: [PR] docs: Move TopK example code to extending-operators documentation [datafusion]

2025-11-07 Thread via GitHub
gowtham1412-p commented on PR #18372: URL: https://github.com/apache/datafusion/pull/18372#issuecomment-3506024792 @Jefffrey Thank you for catching that! No, completely replacing the content was not intended. I'll update the PR to preserve the existing µWheel documentation and add the TopK

Re: [PR] docs: Move TopK example code to extending-operators documentation [datafusion]

2025-11-07 Thread via GitHub
gowtham1412-p commented on PR #18372: URL: https://github.com/apache/datafusion/pull/18372#issuecomment-3506029986 @Jefffrey I've updated the PR to preserve the existing µWheel documentation. The TopK example is now added as a second example rather than replacing the original content. Pleas

Re: [I] Regression: `sql_planner` benchmark panic'ing on main [datafusion]

2025-11-07 Thread via GitHub
alamb commented on issue #17801: URL: https://github.com/apache/datafusion/issues/17801#issuecomment-3503937206 While reviewing this PR - https://github.com/apache/datafusion/pull/17813 I had another idea about how to preserve the improvements made to nullif but avoid this bug.

Re: [PR] #17801 Improve nullability reporting of case expressions [datafusion]

2025-11-07 Thread via GitHub
alamb commented on code in PR #17813: URL: https://github.com/apache/datafusion/pull/17813#discussion_r2504683223 ## datafusion/expr/src/predicate_eval.rs: ## Review Comment: Perhaps if we called this module `partial_eval.rs` it would better reflect its contents ###

Re: [PR] Refactor create_hashes to accept array references [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18448: URL: https://github.com/apache/datafusion/pull/18448#issuecomment-3503969755 I'll reschedule to help -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] date_trunc incorrect results in non-UTC timezone [datafusion-comet]

2025-11-07 Thread via GitHub
andygrove commented on issue #2649: URL: https://github.com/apache/datafusion-comet/issues/2649#issuecomment-3503993365 I have a smaller repro: ```scala test("sort on timestamp after changing session timezone") { // create data in specific timezone withSQLConf(SQLConf.SE

Re: [PR] Refactor create_hashes to accept array references [datafusion]

2025-11-07 Thread via GitHub
adriangb merged PR #18448: URL: https://github.com/apache/datafusion/pull/18448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] #17801 Improve nullability reporting of case expressions [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #17813: URL: https://github.com/apache/datafusion/pull/17813#issuecomment-3504237083 🤖: Benchmark completed Details ``` group issue_17801 main -

Re: [PR] Refactor create_hashes to accept array references [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18448: URL: https://github.com/apache/datafusion/pull/18448#issuecomment-3504237589 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubun

Re: [I] Regression: planning TPC-DS query 75 panic'ing on main [datafusion]

2025-11-07 Thread via GitHub
alamb commented on issue #17801: URL: https://github.com/apache/datafusion/issues/17801#issuecomment-3504183865 I made a PR that shows this is a real problem (not just a benchmark problem) - https://github.com/apache/datafusion/pull/18536 -- This is an automated message from the Apache

Re: [PR] chore: [iceberg] test iceberg 1.10.0 [datafusion-comet]

2025-11-07 Thread via GitHub
hsiang-c commented on code in PR #2709: URL: https://github.com/apache/datafusion-comet/pull/2709#discussion_r2505001752 ## dev/diffs/iceberg/1.10.0.diff: ## @@ -0,0 +1,1770 @@ +diff --git a/build.gradle b/build.gradle +index 6bc052885fc..db2aca3a5ee 100644 +--- a/build.gradle +

Re: [PR] perf: improve performance of `vectorized_equal_to` for `PrimitiveGroupValueBuilder` in multi group by aggregation [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #17977: URL: https://github.com/apache/datafusion/pull/17977#issuecomment-3505287847 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubun

Re: [PR] minor: Remove some hard-coded UTC timezones [datafusion-comet]

2025-11-07 Thread via GitHub
andygrove closed pull request #2731: minor: Remove some hard-coded UTC timezones URL: https://github.com/apache/datafusion-comet/pull/2731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] #17801 Improve nullability reporting of case expressions [datafusion]

2025-11-07 Thread via GitHub
pepijnve commented on code in PR #17813: URL: https://github.com/apache/datafusion/pull/17813#discussion_r2505791901 ## datafusion/expr/src/predicate_eval.rs: ## @@ -0,0 +1,727 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [I] Planning time for queries with many columns with union and order by is very slow [datafusion]

2025-11-07 Thread via GitHub
logan-keede commented on issue #17261: URL: https://github.com/apache/datafusion/issues/17261#issuecomment-3505310653 After some testing to narrow the cause of the difference between benchmark and datafusion-cli, I found out that the cause is the fact that `UInt64` somehow does not get opti

Re: [PR] fix: Use Spark session timezone in native execution when creating Arrow schema [WIP] [datafusion-comet]

2025-11-07 Thread via GitHub
codecov-commenter commented on PR #2734: URL: https://github.com/apache/datafusion-comet/pull/2734#issuecomment-3505317457 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2734?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: added lint rule to, solved all errors that arose and updated tests to pass [datafusion]

2025-11-07 Thread via GitHub
Gohlub commented on PR #18520: URL: https://github.com/apache/datafusion/pull/18520#issuecomment-3503502547 > > Yes, I ran cargo test --package datafusion-expr and all but one passed (based on my observations, it seems to not be related to my changes). The failed test is: > > test expr_r

Re: [PR] feat: added lint rule to, solved all errors that arose and updated tests to pass [datafusion]

2025-11-07 Thread via GitHub
Gohlub closed pull request #18520: feat: added lint rule to, solved all errors that arose and updated tests to pass URL: https://github.com/apache/datafusion/pull/18520 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] fix: checkSparkMaybeThrows should compare Spark and Comet results in success case [datafusion-comet]

2025-11-07 Thread via GitHub
codecov-commenter commented on PR #2728: URL: https://github.com/apache/datafusion-comet/pull/2728#issuecomment-3503512926 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2728?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Update ClickBench benchmarks with DataFusion 50.0.0 [datafusion]

2025-11-07 Thread via GitHub
adriangb commented on issue #17721: URL: https://github.com/apache/datafusion/issues/17721#issuecomment-3503514798 > Is this configuration we'd want to enable by default? I don't know what the logistics are on this... my understanding is that ClickBench wants engines to run "default"

Re: [PR] Refactor create_hashes to accept array references [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18448: URL: https://github.com/apache/datafusion/pull/18448#issuecomment-3503472592 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubun

Re: [I] Update ClickBench benchmarks with DataFusion 50.0.0 [datafusion]

2025-11-07 Thread via GitHub
pmcgleenon commented on issue #17721: URL: https://github.com/apache/datafusion/issues/17721#issuecomment-3503474861 quick update on this, after enabling the configuration mentioned in the [blog](https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/) there is an improvement for Q23

Re: [PR] Normalize partitioned and flat object listing [datafusion]

2025-11-07 Thread via GitHub
alamb commented on code in PR #18146: URL: https://github.com/apache/datafusion/pull/18146#discussion_r2504451187 ## datafusion/core/tests/datasource/object_store_access.rs: ## @@ -194,17 +183,8 @@ async fn query_partitioned_csv_file() { +-+---+---+---++

Re: [I] TPCH q1 with no predicates is 2x slower than duckdb [datafusion]

2025-11-07 Thread via GitHub
alamb commented on issue #18411: URL: https://github.com/apache/datafusion/issues/18411#issuecomment-3503726327 > The decoder maintained a hash table of strings and single instanced everything and memoized hash values. This is basically how the existing ByteViewGroupBy thing works (it

Re: [I] Update ClickBench benchmarks with DataFusion 50.0.0 [datafusion]

2025-11-07 Thread via GitHub
alamb commented on issue #17721: URL: https://github.com/apache/datafusion/issues/17721#issuecomment-3503700419 https://github.com/ClickHouse/ClickBench?tab=readme-ov-file says > By default, all tests are run on c6a.4xlarge VM in AWS with 500 GB gp2. The docs about gp2 say t

Re: [PR] Normalize partitioned and flat object listing [datafusion]

2025-11-07 Thread via GitHub
BlakeOrth commented on code in PR #18146: URL: https://github.com/apache/datafusion/pull/18146#discussion_r2504706437 ## datafusion/core/tests/datasource/object_store_access.rs: ## @@ -194,17 +183,8 @@ async fn query_partitioned_csv_file() { +-+---+---+---+-

[PR] feat: support batched table functions [datafusion]

2025-11-07 Thread via GitHub
bubulalabu opened a new pull request, #18535: URL: https://github.com/apache/datafusion/pull/18535 # Batched Table Functions with LATERAL Join Support **Note: This is a draft PR for gathering feedback on the approach. It's working, but not polished yet** I'm interested to hear

[I] `join_using` creates duplicate output fields [datafusion]

2025-11-07 Thread via GitHub
timsaucer opened a new issue, #18537: URL: https://github.com/apache/datafusion/issues/18537 ### Describe the bug From my reading of documentation a `JOIN USING` call should remove duplicate fields on the output. In our current implementation `DataFrame` does not have a `join_u

Re: [PR] fix: Use Spark session timezone in native execution when creating Arrow schema [WIP] [datafusion-comet]

2025-11-07 Thread via GitHub
andygrove commented on PR #2734: URL: https://github.com/apache/datafusion-comet/pull/2734#issuecomment-3505350980 @parthchandra @mbutrovich I'm not looking for a review yet, but I'd like to discuss this with you both next week. I understand the issue much better now. -- This is an autom

Re: [PR] fix: NormalizeNaNAndZero::children() returns child's child [datafusion-comet]

2025-11-07 Thread via GitHub
mbutrovich merged PR #2732: URL: https://github.com/apache/datafusion-comet/pull/2732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] WIP: Pin to Adaptive Parquet Predicate Pushdown [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18368: URL: https://github.com/apache/datafusion/pull/18368#issuecomment-3505402993 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubun

Re: [PR] perf: improve performance of `vectorized_equal_to` for `PrimitiveGroupValueBuilder` in multi group by aggregation [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #17977: URL: https://github.com/apache/datafusion/pull/17977#issuecomment-3505402927 🤖: Benchmark completed Details ``` Comparing HEAD and optimize-primitive-multi-group-by-to-use-simd Benchmark clickbench_extended.json

Re: [PR] CI: add `clippy::needless_pass_by_value` rule [datafusion]

2025-11-07 Thread via GitHub
alamb merged PR #18468: URL: https://github.com/apache/datafusion/pull/18468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] CI: add `clippy::needless_pass_by_value` rule [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18468: URL: https://github.com/apache/datafusion/pull/18468#issuecomment-3501847377 > Thanks @2010YOUY01 and @Jefffrey > > I will make a PR to add some comments to try and help avoid this situation again - https://github.com/apache/datafusion/pull/18526

Re: [I] Enforce lint rule `clippy::needless_pass_by_value` to `datafusion-optimizer` [datafusion]

2025-11-07 Thread via GitHub
alamb commented on issue #18505: URL: https://github.com/apache/datafusion/issues/18505#issuecomment-3501864507 > Hello, I was looking at the reference PR [#18468](https://github.com/apache/datafusion/pull/18468). From what I understand, for this task, we need to add the clippy rule in `Car

[PR] Add comments to Cargo.toml about workspace overrides [datafusion]

2025-11-07 Thread via GitHub
alamb opened a new pull request, #18526: URL: https://github.com/apache/datafusion/pull/18526 ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/18468 ## Rationale for this change We missed the fact that you couldn't yet ad

Re: [PR] chore: Improve framework for specifying that configs can be set with env vars [datafusion-comet]

2025-11-07 Thread via GitHub
martin-g commented on code in PR #2722: URL: https://github.com/apache/datafusion-comet/pull/2722#discussion_r2502287903 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -865,6 +861,37 @@ private class TypedConfigBuilder[T]( CometConf.register(conf) con

Re: [I] Memory should not blow up after Arrow IPC write-read round trip during spilling [datafusion]

2025-11-07 Thread via GitHub
milenkovicm commented on issue #17340: URL: https://github.com/apache/datafusion/issues/17340#issuecomment-3501542772 datafusion log says i should report issue with "record batch exceeds the expected limit by more than the allowed tolerance". unfortunately at the moment i have no reproducer

Re: [I] Release DataFusion `51.0.0` (Nov 2025) [datafusion]

2025-11-07 Thread via GitHub
milenkovicm commented on issue #17558: URL: https://github.com/apache/datafusion/issues/17558#issuecomment-3501407529 > > How did you pick that commit? There's no branch-51 yet, so unsure what will make the cut. Should I test Comet with latest main commit? > > Yes I just picked the la

Re: [PR] Refactor create_hashes to accept array references [datafusion]

2025-11-07 Thread via GitHub
alamb commented on code in PR #18448: URL: https://github.com/apache/datafusion/pull/18448#discussion_r2502900983 ## datafusion/common/src/hash_utils.rs: ## @@ -366,83 +362,113 @@ fn hash_fixed_list_array( Ok(()) } -/// Test version of `create_hashes` that produces the s

Re: [PR] Refactor create_hashes to accept array references [datafusion]

2025-11-07 Thread via GitHub
alamb commented on code in PR #18448: URL: https://github.com/apache/datafusion/pull/18448#discussion_r2504411096 ## datafusion/common/src/hash_utils.rs: ## @@ -366,83 +362,123 @@ fn hash_fixed_list_array( Ok(()) } -/// Test version of `create_hashes` that produces the s

Re: [I] Update ClickBench benchmarks with DataFusion 50.0.0 [datafusion]

2025-11-07 Thread via GitHub
alamb commented on issue #17721: URL: https://github.com/apache/datafusion/issues/17721#issuecomment-3503612172 (interesting to me is that c6a.2xlarge is a graviton processor, which are not known for being the fastest CPUs around) -- This is an automated message from the Apache Git Servic

Re: [PR] chore: upgrade to DataFusion 51.0.0 and Arrow-rs 57.0.0 [datafusion-comet]

2025-11-07 Thread via GitHub
mbutrovich commented on PR #2729: URL: https://github.com/apache/datafusion-comet/pull/2729#issuecomment-3503608773 Marking as draft since this is just for testing until DataFusion 51.0.0 crates are available. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Refactor create_hashes to accept array references [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18448: URL: https://github.com/apache/datafusion/pull/18448#issuecomment-3504578854 🤖: Benchmark completed Details ``` Comparing HEAD and refactor-create-hashes Benchmark clickbench_extended.json ---

Re: [I] Update ClickBench benchmarks with DataFusion 50.0.0 [datafusion]

2025-11-07 Thread via GitHub
Omega359 commented on issue #17721: URL: https://github.com/apache/datafusion/issues/17721#issuecomment-3504631958 > (interesting to me is that c6a.2xlarge is a graviton processor, which are not known for being the fastest CPUs around) Um, no? "Amazon EC2 C6a instances are powered by

Re: [I] Slow aggregrate query with `array_agg`, Polars is 4 times faster for equal query [datafusion]

2025-11-07 Thread via GitHub
duongcongtoai commented on issue #17446: URL: https://github.com/apache/datafusion/issues/17446#issuecomment-3504689479 ``` toai@salamancabrothehood:~/proj/rust/playpy$ bash test.sh sample-1m.parquet Benchmark 1: uv run polar.py sample-1m.parquet Time (mean ± σ): 867.0 ms ± 2

[PR] chore: Remove some hard-coded UTC timezones [datafusion-comet]

2025-11-07 Thread via GitHub
andygrove opened a new pull request, #2731: URL: https://github.com/apache/datafusion-comet/pull/2731 ## Which issue does this PR close? Partial fix for https://github.com/apache/datafusion-comet/issues/2730 ## Rationale for this change Hard-coded time zon

Re: [I] Optimize `date_part` Minute by avoiding unnecessary computation [datafusion]

2025-11-07 Thread via GitHub
Omega359 commented on issue #14043: URL: https://github.com/apache/datafusion/issues/14043#issuecomment-3504831862 @jayzhan211 is this still a concern? Looks like all the changes were in arrow-rs - not sure what would be needed in datafusion side for this -- This is an automated message f

Re: [PR] Refactor create_hashes to accept array references [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18448: URL: https://github.com/apache/datafusion/pull/18448#issuecomment-3504977009 it is strange that the [second run](https://github.com/apache/datafusion/pull/18448#issuecomment-3504578854) also shows speedups. I agree that overall it looks good to me and shipping

Re: [PR] refactor: update cmp and nested data in binary operator [datafusion]

2025-11-07 Thread via GitHub
sunng87 commented on code in PR #18256: URL: https://github.com/apache/datafusion/pull/18256#discussion_r2505580374 ## datafusion/physical-expr-common/src/datum.rs: ## @@ -79,12 +109,18 @@ pub fn apply_cmp_for_nested( | Operator::GtEq | Operator::IsDist

Re: [PR] fix: Eliminate consecutive repartitions [datafusion]

2025-11-07 Thread via GitHub
alamb commented on PR #18521: URL: https://github.com/apache/datafusion/pull/18521#issuecomment-3505027156 @NGA-TRAN and @gabotechs could you also please help review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[I] [EPIC] [DISCUSS] Comet timezone handling [datafusion-comet]

2025-11-07 Thread via GitHub
andygrove opened a new issue, #2733: URL: https://github.com/apache/datafusion-comet/issues/2733 ### What is the problem the feature request solves? Comet currently assumes that all native processing uses the UTC timezone. When reading from Parquet sources, Comet converts timestamps t

Re: [PR] chore(deps): bump taiki-e/install-action from 2.62.46 to 2.62.47 [datafusion]

2025-11-07 Thread via GitHub
comphead commented on PR #18508: URL: https://github.com/apache/datafusion/pull/18508#issuecomment-3505030602 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] chore(deps): bump taiki-e/install-action from 2.62.46 to 2.62.47 [datafusion]

2025-11-07 Thread via GitHub
dependabot[bot] commented on PR #18508: URL: https://github.com/apache/datafusion/pull/18508#issuecomment-3505041747 Looks like this PR has been edited by someone other than Dependabot. That means Dependabot can't rebase it - sorry! If you're happy for Dependabot to recreate it from s

Re: [PR] Consolidate builtin functions examples (#18142) [datafusion]

2025-11-07 Thread via GitHub
comphead commented on code in PR #18523: URL: https://github.com/apache/datafusion/pull/18523#discussion_r2505594899 ## datafusion-examples/examples/builtin_functions/date_time.rs: ## @@ -26,8 +26,7 @@ use datafusion::common::assert_contains; use datafusion::error::Result; use

Re: [I] Support "pre-image" for pruning predicate evaluation [datafusion]

2025-11-07 Thread via GitHub
sdf-jkl commented on issue #18320: URL: https://github.com/apache/datafusion/issues/18320#issuecomment-3504798402 I checked the clickhouse implementation of preimage and they only support the [`toYear`](https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/DateTimeTransforms.h#

Re: [PR] #17801 Improve nullability reporting of case expressions [datafusion]

2025-11-07 Thread via GitHub
pepijnve commented on code in PR #17813: URL: https://github.com/apache/datafusion/pull/17813#discussion_r2505598808 ## datafusion/expr/src/predicate_eval.rs: ## @@ -0,0 +1,727 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] #17801 Improve nullability reporting of case expressions [datafusion]

2025-11-07 Thread via GitHub
pepijnve commented on code in PR #17813: URL: https://github.com/apache/datafusion/pull/17813#discussion_r2505601659 ## datafusion/expr/src/predicate_eval.rs: ## Review Comment: I went for `predicate` since it only applies to boolean predicates. `partial_predicate_eval`? 🤷

Re: [I] Release DataFusion `51.0.0` (Nov 2025) [datafusion]

2025-11-07 Thread via GitHub
alamb commented on issue #17558: URL: https://github.com/apache/datafusion/issues/17558#issuecomment-3505053700 Given the current state of tests, I think we should cut `branch-51` tomorrow (and backport anything else we need to get the release out). I think @xudong963 said he was awa

Re: [PR] fix: checkSparkMaybeThrows should compare Spark and Comet results in success case [datafusion-comet]

2025-11-07 Thread via GitHub
andygrove commented on PR #2728: URL: https://github.com/apache/datafusion-comet/pull/2728#issuecomment-3505049592 I am looking into this failure: ``` 2025-11-07T16:55:12.8149030Z - cast ArrayType to StringType *** FAILED *** (23 seconds, 552 milliseconds) 2025-11-07T16:55:12.8

Re: [PR] fix: Eliminate consecutive repartitions [datafusion]

2025-11-07 Thread via GitHub
gene-bordegaray commented on PR #18521: URL: https://github.com/apache/datafusion/pull/18521#issuecomment-3505062120 > This looks amazing @gene-bordegaray -- thank you 🙏 > > I kicked off some benchmarks to make sure it doesn't impact performance. Assuming not I'll then try and take a

Re: [PR] feat: Support ANSI mode sum expr [datafusion-comet]

2025-11-07 Thread via GitHub
comphead commented on PR #2600: URL: https://github.com/apache/datafusion-comet/pull/2600#issuecomment-3505254319 ``` - Windows support *** FAILED *** (3 seconds, 973 milliseconds) Expected only Comet native operators, but found Project. plan: Project +- Window +- C

[PR] fix: Use Spark session timezone in native execution when creating Arrow schema [WIP] [datafusion-comet]

2025-11-07 Thread via GitHub
andygrove opened a new pull request, #2734: URL: https://github.com/apache/datafusion-comet/pull/2734 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] chore: Format examples in doc strings - spark, sql, sqllogictest, sibstrait [datafusion]

2025-11-07 Thread via GitHub
Jefffrey merged PR #18443: URL: https://github.com/apache/datafusion/pull/18443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] refactor: simplify `calculate_binary_math` in datafusion-functions [datafusion]

2025-11-07 Thread via GitHub
Jefffrey commented on code in PR #18525: URL: https://github.com/apache/datafusion/pull/18525#discussion_r2505752475 ## datafusion/functions/src/utils.rs: ## @@ -140,38 +140,25 @@ where F: Fn(L::Native, R::Native) -> Result, R::Native: TryFrom, { -Ok(match right {

Re: [PR] fix: Eliminate consecutive repartitions [datafusion]

2025-11-07 Thread via GitHub
adriangb commented on code in PR #18521: URL: https://github.com/apache/datafusion/pull/18521#discussion_r2505769532 ## datafusion/physical-optimizer/src/enforce_distribution.rs: ## @@ -1273,7 +1273,7 @@ pub fn ensure_distribution( child = add_merge_on_top(c

Re: [I] Avoid consecutive RepartitionExec [datafusion]

2025-11-07 Thread via GitHub
gene-bordegaray commented on issue #18341: URL: https://github.com/apache/datafusion/issues/18341#issuecomment-3505283923 > [@gene-bordegaray](https://github.com/gene-bordegaray) : Great analysis. I have read both `full report` that includes your studying how physical rules work and the the

Re: [PR] minor: Remove some hard-coded UTC timezones [datafusion-comet]

2025-11-07 Thread via GitHub
codecov-commenter commented on PR #2731: URL: https://github.com/apache/datafusion-comet/pull/2731#issuecomment-3504890002 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2731?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] doc : Documentation for implementing a new expression is outdated. [datafusion-comet]

2025-11-07 Thread via GitHub
mbutrovich closed issue #2150: doc : Documentation for implementing a new expression is outdated. URL: https://github.com/apache/datafusion-comet/issues/2150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

  1   2   >