[PR] chore(deps): bump taiki-e/install-action from 2.62.29 to 2.62.31 [datafusion]

2025-10-16 Thread via GitHub
dependabot[bot] opened a new pull request, #18094: URL: https://github.com/apache/datafusion/pull/18094 Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.29 to 2.62.31. Release notes Sourced from https://github.com/taiki-e/install-action/releases";

Re: [I] DictionaryKeyOverflowError on DataFrame.write_parquet [datafusion]

2025-10-16 Thread via GitHub
duongcongtoai commented on issue #17445: URL: https://github.com/apache/datafusion/issues/17445#issuecomment-3410520759 i think so, is only checking if the ptrs are equal between dictionary, this will be broken when: - spilling - repartition (if you set the original script to only use

Re: [PR] docs: Publish 0.11.0 user guide [datafusion-comet]

2025-10-16 Thread via GitHub
mbutrovich merged PR #2589: URL: https://github.com/apache/datafusion-comet/pull/2589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Introduce `expr_fields` to `AccumulatorArgs` to hold input argument fields [datafusion]

2025-10-16 Thread via GitHub
kosiew commented on PR #18100: URL: https://github.com/apache/datafusion/pull/18100#issuecomment-3413462730 @Jefffrey Your approach is an improvement! ✅ Simpler implementation - straightforward addition of pre-computed fields ✅ Less cognitive overhead - users don't need to u

[PR] move repartition to insta [datafusion]

2025-10-16 Thread via GitHub
blaginin opened a new pull request, #18106: URL: https://github.com/apache/datafusion/pull/18106 Related https://github.com/apache/datafusion/pull/16324 https://github.com/apache/datafusion/pull/16617 almost there! -- This is an automated message from the Apache Git Service. To res

Re: [I] Release DataFusion `50.3.0` (minor) [datafusion]

2025-10-16 Thread via GitHub
avantgardnerio commented on issue #18072: URL: https://github.com/apache/datafusion/issues/18072#issuecomment-341317 I created https://github.com/apache/datafusion/pull/18107 . Sorry for the delay! -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] #17801 Improve nullability reporting of case expressions [datafusion]

2025-10-16 Thread via GitHub
pepijnve commented on PR #17813: URL: https://github.com/apache/datafusion/pull/17813#issuecomment-3409913434 Poking @comphead since they reviewed that PR as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] add msrvcheck [datafusion-ballista]

2025-10-16 Thread via GitHub
milenkovicm commented on PR #1328: URL: https://github.com/apache/datafusion-ballista/pull/1328#issuecomment-3409959674 all right, if I understand correctly, if datafusion is 1.85 we could decide to have ballista at 1.86 but not 1.84, as datafusion might not work in 1.84? -- This is an

Re: [PR] "Gentle Introduction to Arrow / Record Batches" #11336 [datafusion]

2025-10-16 Thread via GitHub
sm4rtm4art commented on code in PR #18051: URL: https://github.com/apache/datafusion/pull/18051#discussion_r2435793583 ## docs/source/user-guide/dataframe.md: ## @@ -109,6 +111,10 @@ async fn main() -> Result<()> { } ``` +--- + +# REFERENCES + Review Comment: Sorry, forg

Re: [PR] Improve datafusion-cli object store profiling summary display [datafusion]

2025-10-16 Thread via GitHub
BlakeOrth commented on code in PR #18085: URL: https://github.com/apache/datafusion/pull/18085#discussion_r2436711776 ## datafusion-cli/src/object_storage/instrumented.rs: ## @@ -537,26 +662,14 @@ mod tests { extra_display: None, }); -let summarie

Re: [PR] feat: Support configurable explain analyze detail level [datafusion]

2025-10-16 Thread via GitHub
alamb commented on code in PR #18098: URL: https://github.com/apache/datafusion/pull/18098#discussion_r2436828504 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -158,6 +159,40 @@ async fn explain_analyze_baseline_metrics() { fn nanos_from_timestamp(ts: &Timestamp) -> i6

Re: [PR] Add extra case_when benchmarks [datafusion]

2025-10-16 Thread via GitHub
alamb commented on code in PR #18097: URL: https://github.com/apache/datafusion/pull/18097#discussion_r2436883863 ## datafusion/physical-expr/benches/case_when.rs: ## @@ -54,69 +53,148 @@ fn criterion_benchmark(c: &mut Criterion) { let c1 = Arc::new(c1.finish()); let c

Re: [PR] chore: use `NullBuffer::union` for Spark `concat` [datafusion]

2025-10-16 Thread via GitHub
alamb commented on code in PR #18087: URL: https://github.com/apache/datafusion/pull/18087#discussion_r2436968088 ## datafusion/spark/src/function/string/concat.rs: ## @@ -31,6 +32,10 @@ use std::sync::Arc; /// /// Concatenates multiple input strings into a single string. ///

Re: [PR] feat: Add percentile_cont aggregate function [datafusion]

2025-10-16 Thread via GitHub
Jefffrey commented on code in PR #17988: URL: https://github.com/apache/datafusion/pull/17988#discussion_r2438329135 ## datafusion/functions-aggregate/src/percentile_cont.rs: ## @@ -0,0 +1,839 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

[PR] fix: Repull latest datafusion-testing module so extended tests succeed [datafusion]

2025-10-16 Thread via GitHub
Jefffrey opened a new pull request, #18110: URL: https://github.com/apache/datafusion/pull/18110 Looks like #17988 accidentally reverted the bump from #18096 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] "Gentle Introduction to Arrow / Record Batches" #11336 [datafusion]

2025-10-16 Thread via GitHub
Jefffrey commented on code in PR #18051: URL: https://github.com/apache/datafusion/pull/18051#discussion_r2436117873 ## docs/source/user-guide/arrow-introduction.md: ## @@ -0,0 +1,301 @@ + + +# A Gentle Introduction to Arrow & RecordBatches (for DataFusion users) + +```{contents

Re: [I] Documentation site rendering issue [datafusion-comet]

2025-10-16 Thread via GitHub
mbutrovich closed issue #2580: Documentation site rendering issue URL: https://github.com/apache/datafusion-comet/issues/2580 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] "Gentle Introduction to Arrow / Record Batches" #11336 [datafusion]

2025-10-16 Thread via GitHub
Jefffrey commented on code in PR #18051: URL: https://github.com/apache/datafusion/pull/18051#discussion_r2436102534 ## docs/source/user-guide/arrow-introduction.md: ## @@ -0,0 +1,301 @@ + + +# A Gentle Introduction to Arrow & RecordBatches (for DataFusion users) + +```{contents

Re: [PR] "Gentle Introduction to Arrow / Record Batches" #11336 [datafusion]

2025-10-16 Thread via GitHub
Jefffrey commented on code in PR #18051: URL: https://github.com/apache/datafusion/pull/18051#discussion_r2436140831 ## docs/source/user-guide/arrow-introduction.md: ## @@ -0,0 +1,301 @@ + + +# A Gentle Introduction to Arrow & RecordBatches (for DataFusion users) + +```{contents

[PR] fix: update REST API route syntax for axum 0.8 compatibility [datafusion-ballista]

2025-10-16 Thread via GitHub
tomsanbear opened a new pull request, #1330: URL: https://github.com/apache/datafusion-ballista/pull/1330 # Which issue does this PR close? Closes #1329 # Rationale for this change Axum 0.8 changed route parameter syntax from `:param` to `{param}`. Old syntax causes pani

Re: [PR] refactor: remove unused `type_coercion/aggregate.rs` functions [datafusion]

2025-10-16 Thread via GitHub
comphead commented on code in PR #18091: URL: https://github.com/apache/datafusion/pull/18091#discussion_r2437264928 ## datafusion/expr-common/src/type_coercion/aggregates.rs: ## @@ -16,31 +16,11 @@ // under the License. use crate::signature::TypeSignature; -use arrow::datat

Re: [I] Expand use of sql parsing string expressions in DataFrame [datafusion-python]

2025-10-16 Thread via GitHub
milenkovicm commented on issue #1278: URL: https://github.com/apache/datafusion-python/issues/1278#issuecomment-3412620111 i hope to tick off few of them off the list this weekend -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] Do we need to align the number of Tokio runtime threads with the number of executor cores? [datafusion-comet]

2025-10-16 Thread via GitHub
andygrove commented on issue #2572: URL: https://github.com/apache/datafusion-comet/issues/2572#issuecomment-3412619683 > > Maybe the default number of Tokio worker threads should be aligned with the number of Spark executor cores. > > Is it necessary to create a PR for alignment? [

Re: [I] unexpected output for `concat` for arrays [datafusion]

2025-10-16 Thread via GitHub
comphead commented on issue #18020: URL: https://github.com/apache/datafusion/issues/18020#issuecomment-3412565148 Thanks @EeshanBembi it might actually work, would be nice if you create a draft PR and explore how that would work. -- This is an automated message from the Apache Git Servi

Re: [I] Explore integration with Delta Lake [datafusion-comet]

2025-10-16 Thread via GitHub
Nassiel commented on issue #174: URL: https://github.com/apache/datafusion-comet/issues/174#issuecomment-3408362314 Is there any pull request or roadmap to include this or for the moment is just a nice to have in this thread? We heavily use spark with deltalake but despite our heavy

Re: [I] extended tests failures on main [datafusion]

2025-10-16 Thread via GitHub
alamb commented on issue #18084: URL: https://github.com/apache/datafusion/issues/18084#issuecomment-3410238939 Here is a PR to fix them - https://github.com/apache/datafusion/pull/18096 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[PR] Adds instrumentation to LIST operations in CLI [datafusion]

2025-10-16 Thread via GitHub
BlakeOrth opened a new pull request, #18103: URL: https://github.com/apache/datafusion/pull/18103 ## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - https://github.com/apache/datafusion/issues/17207 The f

[I] Improve main landing page in Comet site [datafusion-comet]

2025-10-16 Thread via GitHub
andygrove opened a new issue, #2592: URL: https://github.com/apache/datafusion-comet/issues/2592 ### What is the problem the feature request solves? The main landing page is a little underwhelming: https://github.com/user-attachments/assets/f017b045-4b16-4910-9f3a-b61909be1c9c";

Re: [PR] feat: Add progress bar with ETA estimation to datafusion-cli [datafusion]

2025-10-16 Thread via GitHub
pepijnve commented on code in PR #17867: URL: https://github.com/apache/datafusion/pull/17867#discussion_r2435067371 ## datafusion-cli/src/progress/plan_introspect.rs: ## @@ -0,0 +1,152 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] Adds Trace and Summary to CLI instrumented stores [datafusion]

2025-10-16 Thread via GitHub
alamb merged PR #18064: URL: https://github.com/apache/datafusion/pull/18064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Datafusion 50 Performance Regression (array_has style filter/join for Parquet data set) [datafusion]

2025-10-16 Thread via GitHub
alamb commented on issue #18070: URL: https://github.com/apache/datafusion/issues/18070#issuecomment-3411006833 I was able to reproduce this . Thank you for the report @ianthetechie repo.sql ```sql CREATE EXTERNAL TABLE categories_raw STORED AS PARQUET LOCATION

Re: [I] Support AVRO Format for Write Queries [datafusion]

2025-10-16 Thread via GitHub
alamb commented on issue #7679: URL: https://github.com/apache/datafusion/issues/7679#issuecomment-3412302891 In case you want to start early, I have a PR that integrates with the latest arrow main that you could use until we actually release arrow 57 (likely will be late next week) - ht

Re: [I] `SubqueryAlias`, `Values`, and/or `EmptyRelation` have incorrect schemas after replacing `Placeholder` values [datafusion]

2025-10-16 Thread via GitHub
paleolimbot commented on issue #18102: URL: https://github.com/apache/datafusion/issues/18102#issuecomment-3412355785 Possibly related is that placeholders cannot be aliased: ```rust let ctx = SessionContext::new(); let df = ctx.sql("SELECT $1 AS one, $2 AS two").await.unwrap();

Re: [I] Scheduler panics at startup when REST API enabled with Axum 0.8 [datafusion-ballista]

2025-10-16 Thread via GitHub
milenkovicm closed issue #1329: Scheduler panics at startup when REST API enabled with Axum 0.8 URL: https://github.com/apache/datafusion-ballista/issues/1329 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Adds instrumentation to LIST operations in CLI [datafusion]

2025-10-16 Thread via GitHub
BlakeOrth commented on code in PR #18103: URL: https://github.com/apache/datafusion/pull/18103#discussion_r2437209184 ## datafusion-cli/src/object_storage/instrumented.rs: ## @@ -186,6 +209,10 @@ impl ObjectStore for InstrumentedObjectStore { } fn list(&self, prefix:

Re: [PR] add msrvcheck [datafusion-ballista]

2025-10-16 Thread via GitHub
milenkovicm commented on PR #1328: URL: https://github.com/apache/datafusion-ballista/pull/1328#issuecomment-3412512922 thanks @killzoner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] infra: remove build-macos-x86_64 [datafusion-ballista]

2025-10-16 Thread via GitHub
milenkovicm commented on code in PR #1325: URL: https://github.com/apache/datafusion-ballista/pull/1325#discussion_r2437218126 ## .github/workflows/build.yml: ## @@ -105,15 +105,14 @@ jobs: name: python-wheel-license path: LICENSE.txt - build-python-mac

Re: [PR] Adds instrumentation to LIST operations in CLI [datafusion]

2025-10-16 Thread via GitHub
BlakeOrth commented on PR #18103: URL: https://github.com/apache/datafusion/pull/18103#issuecomment-3412526966 > I tested this locally, and I don't see any LIST appearing in the output 🤔 > > I think you have to also instrument `list_wit_delimiter` 🤔 > > ```sql > andrewlamb@An

Re: [PR] Add spilling to RepartitionExec [datafusion]

2025-10-16 Thread via GitHub
adriangb commented on PR #18014: URL: https://github.com/apache/datafusion/pull/18014#issuecomment-3410593758 > I have a question: are we assuming that it's not possible to make RepartitionExec memory-constant (i.e., O(n_partitions * batch_size) memory)? Is this due to an engineering limita

Re: [PR] feat:support ansi mode remainder function [datafusion-comet]

2025-10-16 Thread via GitHub
andygrove merged PR #2556: URL: https://github.com/apache/datafusion-comet/pull/2556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] #17801 Improve nullability reporting of case expressions [datafusion]

2025-10-16 Thread via GitHub
pepijnve commented on code in PR #17813: URL: https://github.com/apache/datafusion/pull/17813#discussion_r2436653038 ## datafusion/core/tests/tpcds_planning.rs: ## @@ -1052,9 +1052,12 @@ async fn regression_test(query_no: u8, create_physical: bool) -> Result<()> { for sql

[PR] docs: Update benchmark results [datafusion-comet]

2025-10-16 Thread via GitHub
andygrove opened a new pull request, #2596: URL: https://github.com/apache/datafusion-comet/pull/2596 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/2594 ## Rationale for this change Update benchmark results.

Re: [PR] feat: support session token parameter for AmazonS3 [datafusion-python]

2025-10-16 Thread via GitHub
timsaucer merged PR #1275: URL: https://github.com/apache/datafusion-python/pull/1275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Support S3 session_token [datafusion-python]

2025-10-16 Thread via GitHub
timsaucer closed issue #1133: Support S3 session_token URL: https://github.com/apache/datafusion-python/issues/1133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] Add PostgreSQL-style named arguments support for scalar functions [datafusion]

2025-10-16 Thread via GitHub
Omega359 commented on PR #18019: URL: https://github.com/apache/datafusion/pull/18019#issuecomment-3411776423 Nice! I'll try and find time to review this if no one beats me to it in the next few days. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Release DataFusion `50.3.0` (minor) [datafusion]

2025-10-16 Thread via GitHub
alamb commented on issue #18072: URL: https://github.com/apache/datafusion/issues/18072#issuecomment-3408134220 Another potential candidate: - https://github.com/apache/datafusion/issues/18070 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] Discussion: API for Join Access Path and Join Order Selection [datafusion]

2025-10-16 Thread via GitHub
NGA-TRAN commented on issue #17718: URL: https://github.com/apache/datafusion/issues/17718#issuecomment-3412855224 I've put together a write-up on [Join Order Enumeration](https://docs.google.com/document/d/1KjEwrDd9IKDFrWJAYb91vq_B3RWYij9soTM6zcUf62c/edit?tab=t.0#heading=h.lfqsrg5zccgc).

Re: [PR] refactor: move ListingTable over to the catalog-listing-table crate [datafusion]

2025-10-16 Thread via GitHub
timsaucer commented on code in PR #18080: URL: https://github.com/apache/datafusion/pull/18080#discussion_r2433974993 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1638,15 +191,18 @@ mod tests { let ctx = SessionContext::new(); let testdata = dataf

Re: [PR] docs: Add changelog for 0.11.0 release [datafusion-comet]

2025-10-16 Thread via GitHub
mbutrovich commented on PR #2585: URL: https://github.com/apache/datafusion-comet/pull/2585#issuecomment-3408207381 Draft until #2584 merges since it's included in the changelog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[I] Publish new benchmarks for 0.11.0 release [datafusion-comet]

2025-10-16 Thread via GitHub
andygrove opened a new issue, #2594: URL: https://github.com/apache/datafusion-comet/issues/2594 ### What is the problem the feature request solves? _No response_ ### Describe the potential solution _No response_ ### Additional context _No response_ -- Th

[PR] WIP: concat for arrays [datafusion]

2025-10-16 Thread via GitHub
comphead opened a new pull request, #18105: URL: https://github.com/apache/datafusion/pull/18105 ## Which issue does this PR close? - Related #18020 . ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] docs: Update benchmark results [datafusion-comet]

2025-10-16 Thread via GitHub
andygrove commented on PR #2596: URL: https://github.com/apache/datafusion-comet/pull/2596#issuecomment-3413415678 > > Thanks @andygrove I know I'm a nerd, but also would be nice to have an OS where benches were ran on. > > On Linux those are faster than on Mac, but users may still expec

Re: [PR] refactor: remove unused `type_coercion/aggregate.rs` functions [datafusion]

2025-10-16 Thread via GitHub
Jefffrey commented on code in PR #18091: URL: https://github.com/apache/datafusion/pull/18091#discussion_r2437952623 ## datafusion/functions-aggregate/src/average.rs: ## @@ -125,8 +126,61 @@ impl AggregateUDFImpl for Avg { &self.signature } +fn coerce_types(&

Re: [PR] docs: Update HOWTOs for adding new functions [datafusion]

2025-10-16 Thread via GitHub
Jefffrey commented on code in PR #18089: URL: https://github.com/apache/datafusion/pull/18089#discussion_r2437963150 ## docs/source/contributor-guide/howtos.md: ## @@ -139,9 +166,13 @@ After you've confirmed your `taplo` version, you can format all the `.toml` file taplo fmt