[PR] chore(deps): bump crate-ci/typos from 1.37.0 to 1.37.1 [datafusion]

2025-10-02 Thread via GitHub
dependabot[bot] opened a new pull request, #17878: URL: https://github.com/apache/datafusion/pull/17878 Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.37.0 to 1.37.1. Release notes Sourced from https://github.com/crate-ci/typos/releases";>crate-ci/typos's release

Re: [PR] fix: UnnestExec preserves relevant equivalence properties of input [datafusion]

2025-10-02 Thread via GitHub
vegarsti commented on PR #16985: URL: https://github.com/apache/datafusion/pull/16985#issuecomment-3359753099 @berkaysynnada @suremarc @alamb Gentle ping for a review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[PR] chore(deps): bump taiki-e/install-action from 2.62.14 to 2.62.16 [datafusion]

2025-10-02 Thread via GitHub
dependabot[bot] opened a new pull request, #17879: URL: https://github.com/apache/datafusion/pull/17879 Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.14 to 2.62.16. Release notes Sourced from https://github.com/taiki-e/install-action/releases";

[PR] optimizer: allow projection pushdown through aliased recursive CTE references [datafusion]

2025-10-02 Thread via GitHub
kosiew opened a new pull request, #17875: URL: https://github.com/apache/datafusion/pull/17875 ## Which issue does this PR close? * Closes #16684. ## Rationale for this change The projection-pruning rule in the optimizer previously treated any `SubqueryAlias` whose a

Re: [PR] chore(deps): bump crate-ci/typos from 1.37.0 to 1.37.1 [datafusion]

2025-10-02 Thread via GitHub
Jefffrey merged PR #17878: URL: https://github.com/apache/datafusion/pull/17878 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore(deps): bump taiki-e/install-action from 2.62.14 to 2.62.16 [datafusion]

2025-10-02 Thread via GitHub
Jefffrey merged PR #17879: URL: https://github.com/apache/datafusion/pull/17879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] `ScalarValue::convert_array_to_scalar_vec` doesn't respect null list elements [datafusion]

2025-10-02 Thread via GitHub
vegarsti commented on issue #17749: URL: https://github.com/apache/datafusion/issues/17749#issuecomment-3361121327 > > From what I understand it seems more correct if it returns None rather than the empty list in that case, is that what you're implying as well? And making this change would

Re: [PR] docs: Improve documentation for FunctionFactory / CREATE FUNCTION [datafusion]

2025-10-02 Thread via GitHub
alamb commented on code in PR #17859: URL: https://github.com/apache/datafusion/pull/17859#discussion_r2398990598 ## datafusion/core/src/execution/context/mod.rs: ## @@ -1786,28 +1786,56 @@ impl From for SessionStateBuilder { /// A planner used to add extensions to DataFusion l

Re: [PR] feat: Support reverse function with ArrayType input [datafusion-comet]

2025-10-02 Thread via GitHub
cfmcgrady commented on code in PR #2481: URL: https://github.com/apache/datafusion-comet/pull/2481#discussion_r2399209075 ## spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: ## @@ -163,21 +175,26 @@ class CometArrayExpressionSuite extends CometTestBase wit

[I] It would be great if we could use variable names instead of `$1` and support default values in `CREATE FUNCTION` [datafusion]

2025-10-02 Thread via GitHub
alamb opened a new issue, #17887: URL: https://github.com/apache/datafusion/issues/17887 one note, not related to this PR, it would be great if we could use variable names instead of `$1` and support default values, last time i've tried it did not work ```sql CREATE FUNCTION our_

Re: [PR] Minor: reuse test schemas in simplify tests [datafusion]

2025-10-02 Thread via GitHub
alamb commented on PR #17864: URL: https://github.com/apache/datafusion/pull/17864#issuecomment-3361403521 Thanks @comphead and @martin-g -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] docs: Improve documentation for FunctionFactory / CREATE FUNCTION [datafusion]

2025-10-02 Thread via GitHub
alamb commented on code in PR #17859: URL: https://github.com/apache/datafusion/pull/17859#discussion_r2398991705 ## datafusion/core/src/execution/context/mod.rs: ## @@ -1786,28 +1786,56 @@ impl From for SessionStateBuilder { /// A planner used to add extensions to DataFusion l

Re: [PR] docs: Improve documentation for FunctionFactory / CREATE FUNCTION [datafusion]

2025-10-02 Thread via GitHub
milenkovicm commented on PR #17859: URL: https://github.com/apache/datafusion/pull/17859#issuecomment-3361913666 I've merged this, thanks @alamb, great improvement -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] docs: Improve documentation for FunctionFactory / CREATE FUNCTION [datafusion]

2025-10-02 Thread via GitHub
milenkovicm merged PR #17859: URL: https://github.com/apache/datafusion/pull/17859 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] fix: Fix regression with plan stability tests in CI [datafusion-comet]

2025-10-02 Thread via GitHub
andygrove closed pull request #2492: fix: Fix regression with plan stability tests in CI URL: https://github.com/apache/datafusion-comet/pull/2492 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: Fix regression with plan stability tests in CI [datafusion-comet]

2025-10-02 Thread via GitHub
andygrove commented on PR #2492: URL: https://github.com/apache/datafusion-comet/pull/2492#issuecomment-3361934632 This PR is too large to review. I am going to create smaller PRs to build up to this. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Inadvertently merged commits to main without a PR [datafusion-python]

2025-10-02 Thread via GitHub
kylebarron commented on issue #1258: URL: https://github.com/apache/datafusion-python/issues/1258#issuecomment-3361063331 That's fine with me. I think there should be a way in github settings to prevent push to main while allowing merging without approvals. -- This is an automated messag

Re: [I] `ScalarValue::convert_array_to_scalar_vec` doesn't respect null list elements [datafusion]

2025-10-02 Thread via GitHub
Jefffrey commented on issue #17749: URL: https://github.com/apache/datafusion/issues/17749#issuecomment-3361046269 > From what I understand it seems more correct if it returns None rather than the empty list in that case, is that what you're implying as well? And making this change would me

Re: [PR] Remove spurious `Use` in InListExpr display formatted output [datafusion]

2025-10-02 Thread via GitHub
alamb commented on code in PR #17884: URL: https://github.com/apache/datafusion/pull/17884#discussion_r2398980758 ## datafusion/physical-expr/src/expressions/in_list.rs: ## @@ -1453,31 +1464,31 @@ mod tests { let sql_string = fmt_sql(expr.as_ref()).to_string();

Re: [PR] minor: Make `FunctionRegistry` `udafs` and `udwfs` methods mandatory [datafusion]

2025-10-02 Thread via GitHub
alamb merged PR #17847: URL: https://github.com/apache/datafusion/pull/17847 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] minor: Make `FunctionRegistry` `udafs` and `udwfs` methods mandatory [datafusion]

2025-10-02 Thread via GitHub
alamb commented on PR #17847: URL: https://github.com/apache/datafusion/pull/17847#issuecomment-3361433471 Thanks @milenkovicm and @martin-g -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Fix failing CI caused by hash collisions [datafusion]

2025-10-02 Thread via GitHub
alamb commented on PR #17886: URL: https://github.com/apache/datafusion/pull/17886#issuecomment-3361389036 I verified locally that running the following command ```shell cargo test --profile ci --exclude datafusion-examples --exclude datafusion-benchmarks --exclude datafusion-sqll

Re: [I] Extended tests (hash collisions) failing on main [datafusion]

2025-10-02 Thread via GitHub
alamb closed issue #17882: Extended tests (hash collisions) failing on main URL: https://github.com/apache/datafusion/issues/17882 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Minor: reuse test schemas in simplify tests [datafusion]

2025-10-02 Thread via GitHub
alamb merged PR #17864: URL: https://github.com/apache/datafusion/pull/17864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix failing CI caused by hash collisions [datafusion]

2025-10-02 Thread via GitHub
alamb merged PR #17886: URL: https://github.com/apache/datafusion/pull/17886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] [WIP] Upgrade to arrow/parquet 57.0.0 [datafusion]

2025-10-02 Thread via GitHub
alamb opened a new pull request, #17888: URL: https://github.com/apache/datafusion/pull/17888 ## Which issue does this PR close? - Related to https://github.com/apache/arrow-rs/issues/7835 ## Rationale for this change Upgrade to the latest arrow ## What changes are

Re: [I] `SortMergeJoinExec` fails to allocate memory but should spill instead [datafusion-comet]

2025-10-02 Thread via GitHub
andygrove commented on issue #2452: URL: https://github.com/apache/datafusion-comet/issues/2452#issuecomment-3361528202 @comphead @parthchandra Here is what I know so far (which is not very much. I am just getting started with understanding DataFusion's code in this area). `sort_and_

Re: [PR] feat: Add `backtrace` feature to simplify enabling native backtraces in `CometNativeException` [datafusion-comet]

2025-10-02 Thread via GitHub
codecov-commenter commented on PR #2515: URL: https://github.com/apache/datafusion-comet/pull/2515#issuecomment-3361567864 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2515?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] feat: Add `backtrace` feature to simplify enabling native backtraces in `CometNativeException` [datafusion-comet]

2025-10-02 Thread via GitHub
andygrove opened a new pull request, #2515: URL: https://github.com/apache/datafusion-comet/pull/2515 ## Which issue does this PR close? N/A ## Rationale for this change Make's life simple when debugging. ## What changes are included in this PR?

[PR] fix: Enable plan stability tests for `auto` scan [datafusion-comet]

2025-10-02 Thread via GitHub
andygrove opened a new pull request, #2516: URL: https://github.com/apache/datafusion-comet/pull/2516 ## Which issue does this PR close? Partial fix for https://github.com/apache/datafusion-comet/issues/2469 I will follow up with additional PRs to enable plan stability t

Re: [I] `SortMergeJoinExec` fails to allocate memory but should spill instead [datafusion-comet]

2025-10-02 Thread via GitHub
comphead commented on issue #2452: URL: https://github.com/apache/datafusion-comet/issues/2452#issuecomment-3362011811 Yes, you are right @andygrove the way spilling done in sorter and SMJ is: - try reserve memory - if failed and spilling supported then spill However we need to

Re: [PR] fix: Enable plan stability tests for `auto` scan [datafusion-comet]

2025-10-02 Thread via GitHub
codecov-commenter commented on PR #2516: URL: https://github.com/apache/datafusion-comet/pull/2516#issuecomment-3362013357 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2516?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Feat: [datafusion-spark] Migrate avg from comet to datafusion-spark and add tests. [datafusion]

2025-10-02 Thread via GitHub
andygrove commented on code in PR #17871: URL: https://github.com/apache/datafusion/pull/17871#discussion_r2399341760 ## datafusion/spark/src/function/aggregate/avg.rs: ## @@ -0,0 +1,337 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [I] [EPIC] ListingTable object store caching impovements [datafusion]

2025-10-02 Thread via GitHub
BlakeOrth commented on issue #17214: URL: https://github.com/apache/datafusion/issues/17214#issuecomment-3362031834 @alamb I haven't pushed and opened a PR for #17211 yet, but I would be happy to do so if we want to start getting some feedback on the implementation. I actually think that co

Re: [PR] POC: datafusion-cli instrumented object store [datafusion]

2025-10-02 Thread via GitHub
BlakeOrth commented on PR #17266: URL: https://github.com/apache/datafusion/pull/17266#issuecomment-3362040035 @alamb It's not entirely clear to me how we should proceed to keep this effort moving. There's been some discussion of using this PR and polishing it up with tests/docs etc vs spli

Re: [I] Is it possible to pass query parameters? (`:param` or `?`) [datafusion-python]

2025-10-02 Thread via GitHub
timsaucer commented on issue #513: URL: https://github.com/apache/datafusion-python/issues/513#issuecomment-3361133274 I've been using Claude to assist me in trying to understand the conventions, so take this with a grain of salt. - DuckDB: `result = duckdb.sql("SELECT * FROM df", df

Re: [I] `ScalarValue::convert_array_to_scalar_vec` doesn't respect null list elements [datafusion]

2025-10-02 Thread via GitHub
vegarsti commented on issue #17749: URL: https://github.com/apache/datafusion/issues/17749#issuecomment-3361121550 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Remove spurious `Use` in InListExpr display formatted output [datafusion]

2025-10-02 Thread via GitHub
pepijnve commented on PR #17884: URL: https://github.com/apache/datafusion/pull/17884#issuecomment-3361235907 @Jefffrey I'm fixing the failing test. While I'm at it I'm considering adding another change in this PR that uses Display for the set elements rather than Debug. The output now is w

[I] Implement typo check github action [datafusion-site]

2025-10-02 Thread via GitHub
Jefffrey opened a new issue, #118: URL: https://github.com/apache/datafusion-site/issues/118 Like we have on main repo: https://github.com/apache/datafusion/blob/b81073ad4ca99b8c1139760843639c11cb1dea4a/.github/workflows/rust.yml#L780-L787 But limit to only `content/blog` --

[PR] [branch-50] Backport: `avg(distinct)` support for decimal types (#17560) [datafusion]

2025-10-02 Thread via GitHub
AdamGS opened a new pull request, #17885: URL: https://github.com/apache/datafusion/pull/17885 ## Which issue does this PR close? - Related to https://github.com/apache/datafusion/issues/17849 - #2408 ## Rationale for this change Support `avg(decimal)`, see original PR

Re: [PR] Remove spurious `Use` in InListExpr display formatted output [datafusion]

2025-10-02 Thread via GitHub
Jefffrey commented on PR #17884: URL: https://github.com/apache/datafusion/pull/17884#issuecomment-3361245250 > @Jefffrey I'm fixing the failing test. While I'm at it I'm considering adding another change in this PR that uses Display for the set elements rather than Debug. The output now is

Re: [PR] [branch-50] Backport: `avg(distinct)` support for decimal types (#17560) [datafusion]

2025-10-02 Thread via GitHub
AdamGS commented on PR #17885: URL: https://github.com/apache/datafusion/pull/17885#issuecomment-3361247761 FYI @xudong963, figured I will make a separate PR for each commit I want to backport to keep the branch's history clean, so this is the first one. -- This is an automated message fr

Re: [PR] build: Add Spark 4.0 to release build script [datafusion-comet]

2025-10-02 Thread via GitHub
andygrove commented on PR #2514: URL: https://github.com/apache/datafusion-comet/pull/2514#issuecomment-3361591859 > The jars for Spark 3.x will work with JDK 11. > > The pom file already specifies > > ``` > 11 > ${java.version} > ${java.version} > ```

Re: [PR] Add `CastColumnExpr` for struct-aware column casting [datafusion]

2025-10-02 Thread via GitHub
alamb commented on PR #17773: URL: https://github.com/apache/datafusion/pull/17773#issuecomment-336147 BTW thank you for pushing this along. I actually think moving fields down through more of DataFusion will help many things, including logical types / Arrow Extension Type support , as

Re: [I] `SortMergeJoinExec` fails to allocate memory but should spill instead [datafusion-comet]

2025-10-02 Thread via GitHub
andygrove commented on issue #2452: URL: https://github.com/apache/datafusion-comet/issues/2452#issuecomment-3358622332 ``` org.apache.comet.CometNativeException: Additional allocation failed for ExternalSorterMerge[9] with top memory consumers (across reservations) as: ExternalSort

[I] JoinSetTracerError is not exported from datafusion-common-runtime [datafusion]

2025-10-02 Thread via GitHub
JanKaul opened a new issue, #17876: URL: https://github.com/apache/datafusion/issues/17876 ## Describe the bug The `JoinSetTracerError` type is used in the public API of `datafusion-common-runtime` but is not exported, making it impossible for downstream users to properly handle erro

[PR] Export JoinSetTracerError from datafusion-common-runtime [datafusion]

2025-10-02 Thread via GitHub
JanKaul opened a new pull request, #17877: URL: https://github.com/apache/datafusion/pull/17877 ## Summary Fixes #17876 The `JoinSetTracerError` type is returned by `set_join_set_tracer()` but was not exported in the public API, making it impossible for downstream users to pro

Re: [PR] Unify Table representations [datafusion-python]

2025-10-02 Thread via GitHub
timsaucer commented on PR #1256: URL: https://github.com/apache/datafusion-python/pull/1256#issuecomment-3360665942 > The remaining blocker is that the unified wrapper no longer exposes `__datafusion_table_provider__`, so anything that depends on that FFI capsule now fails. `PyTableFunctio

Re: [PR] feat: Simplify `NOT(IN ..)` to `NOT IN` and `NOT (EXISTS ..)` to `NOT EXISTS` [datafusion]

2025-10-02 Thread via GitHub
xudong963 merged PR #17848: URL: https://github.com/apache/datafusion/pull/17848 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[I] Spurious 'Use' in InList display output [datafusion]

2025-10-02 Thread via GitHub
pepijnve opened a new issue, #17883: URL: https://github.com/apache/datafusion/issues/17883 ### Describe the bug When the static set optimisation in InListExpr is used, indent explain output will show `Use expr IN (SET) (array_literal)`. The `Use` string should probably not be there

[PR] Remove spurious `Use` in InListExpr display formatted output [datafusion]

2025-10-02 Thread via GitHub
pepijnve opened a new pull request, #17884: URL: https://github.com/apache/datafusion/pull/17884 ## Which issue does this PR close? - Closes #17883. ## Rationale for this change Aligns the explain output for `IN (SET)` and `NOT IN (SET)`. The presence of `Use` is a bit a

[PR] chore(deps): bump taiki-e/install-action from 2.61.8 to 2.62.16 [datafusion-sandbox]

2025-10-02 Thread via GitHub
dependabot[bot] opened a new pull request, #23: URL: https://github.com/apache/datafusion-sandbox/pull/23 Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.8 to 2.62.16. Release notes Sourced from https://github.com/taiki-e/install-action/releases"

Re: [PR] chore(deps): bump taiki-e/install-action from 2.61.8 to 2.62.15 [datafusion-sandbox]

2025-10-02 Thread via GitHub
dependabot[bot] commented on PR #22: URL: https://github.com/apache/datafusion-sandbox/pull/22#issuecomment-3360705995 Superseded by #23. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] chore(deps): bump taiki-e/install-action from 2.61.8 to 2.62.15 [datafusion-sandbox]

2025-10-02 Thread via GitHub
dependabot[bot] closed pull request #22: chore(deps): bump taiki-e/install-action from 2.61.8 to 2.62.15 URL: https://github.com/apache/datafusion-sandbox/pull/22 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[PR] fix: Support scalar/array args for rpad/read_side_padding [datafusion-comet]

2025-10-02 Thread via GitHub
wForget opened a new pull request, #2482: URL: https://github.com/apache/datafusion-comet/pull/2482 ## Which issue does this PR close? Closes #2475. ## Rationale for this change Support full scalar/array arguments for rpad/read_side_padding ## What changes

Re: [PR] Fix failing CI caused by hash collisions [datafusion]

2025-10-02 Thread via GitHub
liamzwbao commented on PR #17886: URL: https://github.com/apache/datafusion/pull/17886#issuecomment-3361296390 cc @alamb @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Extended tests (hash collisions) failing on main [datafusion]

2025-10-02 Thread via GitHub
liamzwbao commented on issue #17882: URL: https://github.com/apache/datafusion/issues/17882#issuecomment-3361270672 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] Fix failing CI caused by hash collisions [datafusion]

2025-10-02 Thread via GitHub
liamzwbao opened a new pull request, #17886: URL: https://github.com/apache/datafusion/pull/17886 ## Which issue does this PR close? - Closes #17882. ## Rationale for this change ## What changes are included in this PR? Fix the CI failure by asserti

Re: [I] Is it possible to pass query parameters? (`:param` or `?`) [datafusion-python]

2025-10-02 Thread via GitHub
kylebarron commented on issue #513: URL: https://github.com/apache/datafusion-python/issues/513#issuecomment-3361172026 > For example suppose they did `ctx.sql(f"select {key_of_interest}, c_name from {df}", df=df_customer)` then I expect this would go very poorly Yes, you'd need to e

Re: [PR] feat: Parquet Modular Encryption with Spark KMS for native readers [datafusion-comet]

2025-10-02 Thread via GitHub
mbutrovich commented on code in PR #2447: URL: https://github.com/apache/datafusion-comet/pull/2447#discussion_r2398950970 ## native/core/src/parquet/encryption_support.rs: ## @@ -0,0 +1,151 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

[I] `array_has` returns `false` instead of `null` for `[null]` list [datafusion]

2025-10-02 Thread via GitHub
Jefffrey opened a new issue, #17817: URL: https://github.com/apache/datafusion/issues/17817 ### Describe the bug Given this test: ```rust #[test] fn test_array_has() -> Result<(), DataFusionError> { let haystack_field = Arc::new(Field::new_list(

[D] How do you account memory in production-grade analytical engines based on DataFusion? Async Parquet reader makes it difficult [datafusion]

2025-10-02 Thread via GitHub
GitHub user devozerov created a discussion: How do you account memory in production-grade analytical engines based on DataFusion? Async Parquet reader makes it difficult DataFusion accounts memory only for blocking operators, assuming that there is some more or less fixed overhead on other da

Re: [PR] [WIP] Upgrade to arrow/parquet 57.0.0 [datafusion]

2025-10-02 Thread via GitHub
alamb commented on PR #17888: URL: https://github.com/apache/datafusion/pull/17888#issuecomment-3361922832 Many of the current failures are due because this used to work: ```sql select arrow_cast('2021-01-01T00:00:00', 'Timestamp(Nanosecond, Some("-05:00"))' ``` or

Re: [PR] feat: Parquet Modular Encryption with Spark KMS for native readers [datafusion-comet]

2025-10-02 Thread via GitHub
parthchandra commented on code in PR #2447: URL: https://github.com/apache/datafusion-comet/pull/2447#discussion_r2399364065 ## native/core/src/parquet/encryption_support.rs: ## @@ -0,0 +1,151 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] feat:Support ANSI mode integral divide [datafusion-comet]

2025-10-02 Thread via GitHub
coderfender commented on PR #2421: URL: https://github.com/apache/datafusion-comet/pull/2421#issuecomment-3362063520 @andygrove please take a look whenever you get a chance . (The failed tests are due to a transient issue) -- This is an automated message from the Apache Git Service. To r

Re: [PR] Fix failing CI caused by hash collisions [datafusion]

2025-10-02 Thread via GitHub
alamb commented on code in PR #17886: URL: https://github.com/apache/datafusion/pull/17886#discussion_r2398959094 ## datafusion/core/tests/physical_optimizer/partition_statistics.rs: ## @@ -440,8 +440,7 @@ mod test { partition_row_counts.push(total_rows); }

Re: [PR] feat: Support reading CSV files with inconsistent column counts [datafusion]

2025-10-02 Thread via GitHub
Jefffrey commented on PR #17553: URL: https://github.com/apache/datafusion/pull/17553#issuecomment-3364010908 I think the problem is the infer schema code runs for a certain number of input rows (configurable via `schema_infer_max_rec`); so if the schema changes across these rows, it gets p

Re: [PR] chore: utilize trait upcasting for AsyncScalarUDF PartialEq & Hash [datafusion]

2025-10-02 Thread via GitHub
Jefffrey commented on code in PR #17872: URL: https://github.com/apache/datafusion/pull/17872#discussion_r2400580979 ## datafusion/expr/src/async_udf.rs: ## @@ -132,3 +128,128 @@ impl Display for AsyncScalarUDF { write!(f, "AsyncScalarUDF: {}", self.inner.name()) }

Re: [PR] optimizer: allow projection pushdown through aliased recursive CTE references [datafusion]

2025-10-02 Thread via GitHub
Jefffrey commented on code in PR #17875: URL: https://github.com/apache/datafusion/pull/17875#discussion_r2400723460 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -727,6 +727,98 @@ async fn parquet_explain_analyze() { assert_contains!(&formatted, "row_groups_pruned

Re: [I] Consider using `cargo-nextest` in CI [datafusion-comet]

2025-10-02 Thread via GitHub
comphead commented on issue #2507: URL: https://github.com/apache/datafusion-comet/issues/2507#issuecomment-3363625942 Thanks @andygrove most of tests are Scala based? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Support `AS`, `UNION`, `INTERSECTION`, `EXCEPT`, `AGGREGATE` pipe operators [datafusion]

2025-10-02 Thread via GitHub
Jefffrey merged PR #17312: URL: https://github.com/apache/datafusion/pull/17312 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] More decimal 32/64 support - type coercsion and misc gaps [datafusion]

2025-10-02 Thread via GitHub
AdamGS commented on code in PR #17808: URL: https://github.com/apache/datafusion/pull/17808#discussion_r2387744827 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -955,28 +974,111 @@ pub fn decimal_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option get_wi

Re: [I] `string_agg` aggregate function is 1000x slower than duckdb (SQLStorm) [datafusion]

2025-10-02 Thread via GitHub
2010YOUY01 commented on issue #17789: URL: https://github.com/apache/datafusion/issues/17789#issuecomment-335901 I did a PR https://github.com/apache/datafusion/pull/17837 to add a specialized accumulator for no `DISTINCT/ORDER` case. I think it's also necessary in addition to the

Re: [PR] perf: Faster `string_agg()` aggregate function (1000x speed for no DISTINCT and ORDER case) [datafusion]

2025-10-02 Thread via GitHub
vegarsti commented on PR #17837: URL: https://github.com/apache/datafusion/pull/17837#issuecomment-3351363095 Amazing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] feat: Support reading CSV files with inconsistent column counts [datafusion]

2025-10-02 Thread via GitHub
Jefffrey commented on PR #17553: URL: https://github.com/apache/datafusion/pull/17553#issuecomment-3363962427 > Hi @alamb — I just wanted to check if you had a chance to see my comment here: > > > > I tried the reproducer from #17516 and it still fails on this PR: > > > Maybe I don

Re: [PR] fix: Throws an exception when struct type has duplicate keys [datafusion-comet]

2025-10-02 Thread via GitHub
comphead commented on code in PR #2459: URL: https://github.com/apache/datafusion-comet/pull/2459#discussion_r2400400401 ## common/src/main/scala/org/apache/spark/sql/comet/util/Utils.scala: ## @@ -154,7 +154,13 @@ object Utils extends CometTypeShim { name,

[PR] fix: fix regression in tpcbench.py [datafusion-comet]

2025-10-02 Thread via GitHub
andygrove opened a new pull request, #2512: URL: https://github.com/apache/datafusion-comet/pull/2512 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] feat: Parquet Modular Encryption with Spark KMS for native readers [datafusion-comet]

2025-10-02 Thread via GitHub
mbutrovich commented on code in PR #2447: URL: https://github.com/apache/datafusion-comet/pull/2447#discussion_r2391075529 ## common/src/main/scala/org/apache/comet/parquet/CometParquetUtils.scala: ## @@ -20,13 +20,30 @@ package org.apache.comet.parquet import org.apache.had

[PR] Implement `AsRef` for `Expr` [datafusion]

2025-10-02 Thread via GitHub
findepi opened a new pull request, #17819: URL: https://github.com/apache/datafusion/pull/17819 This allows writing a function that accepts `&[Expr]` or `&[&Expr]`, thus allowing less cloning when inspecting expression trees. -- This is an automated message from the Apache Git Service

Re: [I] Consider using `cargo-nextest` in CI [datafusion-comet]

2025-10-02 Thread via GitHub
andygrove commented on issue #2507: URL: https://github.com/apache/datafusion-comet/issues/2507#issuecomment-3363739183 The rust tests are failing sometimes in CI and we do not see why. I am hoping this will help. On Thu, Oct 2, 2025 at 5:40 PM Oleks V ***@***.***> wrote: >

Re: [PR] POC: datafusion-cli instrumented object store [datafusion]

2025-10-02 Thread via GitHub
alamb commented on PR #17266: URL: https://github.com/apache/datafusion/pull/17266#issuecomment-3363076396 Thank you for your patience @BlakeOrth -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] `sql_planner` benchmark panic'ing on main [datafusion]

2025-10-02 Thread via GitHub
pepijnve commented on issue #17801: URL: https://github.com/apache/datafusion/issues/17801#issuecomment-3342818767 > Is that supposed to be allowed? In other words, should the simplifier be handling this case or not? Looking at `LogicalPlan::map_expressions` I would say no. The code t

Re: [PR] fix: `ParquetSource` - `with_predicate()` don't have to reset metrics [datafusion]

2025-10-02 Thread via GitHub
2010YOUY01 closed pull request #17858: fix: `ParquetSource` - `with_predicate()` don't have to reset metrics URL: https://github.com/apache/datafusion/pull/17858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Reduce cloning in LogicalPlanBuilder [datafusion]

2025-10-02 Thread via GitHub
joroKr21 commented on code in PR #17675: URL: https://github.com/apache/datafusion/pull/17675#discussion_r2390547786 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1994,32 +2005,29 @@ pub fn table_scan_with_filter_and_fetch( ) } -pub fn table_source(table_schema:

Re: [PR] Optimize CASE expressions by removing WHEN false branches [datafusion]

2025-10-02 Thread via GitHub
alamb closed pull request #17628: Optimize CASE expressions by removing WHEN false branches URL: https://github.com/apache/datafusion/pull/17628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] TPCH q13 crash [datafusion-comet]

2025-10-02 Thread via GitHub
wForget commented on issue #2480: URL: https://github.com/apache/datafusion-comet/issues/2480#issuecomment-3345159863 Seems related to https://github.com/apache/datafusion-benchmarks/blob/main/tpch/queries/q13.sql#L14 -- This is an automated message from the Apache Git Service. To respon

Re: [PR] chore: utilize trait upcasting for AsyncScalarUDF PartialEq & Hash [datafusion]

2025-10-02 Thread via GitHub
Jefffrey commented on code in PR #17872: URL: https://github.com/apache/datafusion/pull/17872#discussion_r2400578220 ## datafusion/expr/src/async_udf.rs: ## @@ -62,17 +61,14 @@ pub struct AsyncScalarUDF { impl PartialEq for AsyncScalarUDF { fn eq(&self, other: &Self) ->

Re: [PR] Implement arithmetic overflow error handling [datafusion]

2025-10-02 Thread via GitHub
EeshanBembi commented on PR #17554: URL: https://github.com/apache/datafusion/pull/17554#issuecomment-3343047310 > @EeshanBembi, I am curious about the performance impact of enabling overflow checking by default. Could you add criterion benchmarks? Hi Andy! I've implemented the crite

Re: [PR] feat:support_integral_decimal_cast_native_impl [datafusion-comet]

2025-10-02 Thread via GitHub
andygrove commented on code in PR #2472: URL: https://github.com/apache/datafusion-comet/pull/2472#discussion_r2394825713 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -322,8 +322,7 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelpe

Re: [PR] perf: Optimize CASE for any WHEN false [datafusion]

2025-10-02 Thread via GitHub
petern48 commented on code in PR #17835: URL: https://github.com/apache/datafusion/pull/17835#discussion_r2389665889 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -1436,33 +1436,59 @@ impl TreeNodeRewriter for Simplifier<'_, S> { // C

Re: [PR] chore: utilize trait upcasting for AsyncScalarUDF PartialEq & Hash [datafusion]

2025-10-02 Thread via GitHub
Jefffrey commented on code in PR #17872: URL: https://github.com/apache/datafusion/pull/17872#discussion_r2400582985 ## datafusion/expr/src/async_udf.rs: ## @@ -132,3 +128,128 @@ impl Display for AsyncScalarUDF { write!(f, "AsyncScalarUDF: {}", self.inner.name()) }

[PR] infra: macos-13 is deprecated [datafusion-ballista]

2025-10-02 Thread via GitHub
kevinjqliu opened a new pull request, #1324: URL: https://github.com/apache/datafusion-ballista/pull/1324 # Which issue does this PR close? Closes #. # Rationale for this change https://github.blog/changelog/2025-09-19-github-actions-macos-13-runner-image-is-clo

[PR] macos-13 is deprecated [datafusion-python]

2025-10-02 Thread via GitHub
kevinjqliu opened a new pull request, #1259: URL: https://github.com/apache/datafusion-python/pull/1259 # Which issue does this PR close? Closes #. # Rationale for this change https://github.blog/changelog/2025-09-19-github-actions-macos-13-runner-image-is-closi

[PR] infra: macos-13 is deprecated [datafusion-ray]

2025-10-02 Thread via GitHub
kevinjqliu opened a new pull request, #89: URL: https://github.com/apache/datafusion-ray/pull/89 https://github.blog/changelog/2025-09-19-github-actions-macos-13-runner-image-is-closing-down/ -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] Add JoinType preservation helpers and `dynamic_filter_side`; enable dynamic filter pushdown in HashJoinExec [datafusion]

2025-10-02 Thread via GitHub
crystalxyz commented on code in PR #17518: URL: https://github.com/apache/datafusion/pull/17518#discussion_r2400014974 ## datafusion/common/src/join_type.rs: ## @@ -74,6 +74,12 @@ pub enum JoinType { RightMark, } +const LEFT_PRESERVING: &[JoinType] = +&[JoinType::Lef

Re: [I] PROPOSAL Hash Join Spilling Proposal [datafusion]

2025-10-02 Thread via GitHub
camuel commented on issue #17267: URL: https://github.com/apache/datafusion/issues/17267#issuecomment-3364176121 From my tests SMJ requires more memory than HJ with a large number of partitions (like 32), so switching from HJ to SMJ makes memory situation worse not better in that case. --

Re: [I] Enable `supports_filter_during_aggregation` for the `Generic` Dialect [datafusion]

2025-10-02 Thread via GitHub
Jefffrey closed issue #15719: Enable `supports_filter_during_aggregation` for the `Generic` Dialect URL: https://github.com/apache/datafusion/issues/15719 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] feat: Implement ANSI support for Round [datafusion-comet]

2025-10-02 Thread via GitHub
coderfender commented on PR #989: URL: https://github.com/apache/datafusion-comet/pull/989#issuecomment-3362378663 I am working on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] feat: Add plan conversion statistics to extended explain info [datafusion-comet]

2025-10-02 Thread via GitHub
andygrove commented on PR #2412: URL: https://github.com/apache/datafusion-comet/pull/2412#issuecomment-3348819831 Thanks for the reviews @comphead and @parthchandra. Could you take another look? -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [forward port] Change version to 50.1.0 and add changelog (#17748) [datafusion]

2025-10-02 Thread via GitHub
xudong963 merged PR #17826: URL: https://github.com/apache/datafusion/pull/17826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

  1   2   >