Re: [I] Consider PR labels to make reviewing easier [datafusion]

2026-01-31 Thread via GitHub
2010YOUY01 commented on issue #20088: URL: https://github.com/apache/datafusion/issues/20088#issuecomment-3830346724 I think this is a great idea. Technically, an open PR should imply `pending review` and a draft PR should imply `pending author action`. In practice, though, this is ha

Re: [I] Support grouped aggregates with known min/max statistics [datafusion]

2026-01-31 Thread via GitHub
jizezhang commented on issue #19938: URL: https://github.com/apache/datafusion/issues/19938#issuecomment-3830477234 Hi @Dandandan , I am interested in this issue. If it has not yet been worked on, I would like to take a stab at it. Based on my understanding of the code, my thought is tha

[PR] test: add sqllogictest coverage for UDWF return types in information_… [datafusion]

2026-01-31 Thread via GitHub
karuppuchamysuresh opened a new pull request, #20098: URL: https://github.com/apache/datafusion/pull/20098 ## What changes were proposed in this pull request?

Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]

2026-01-31 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-3830479590 > Hi @Rachelint any update on this? Continue working today... A bit busy this week, and sorry for delay for the pr. -- This is an automated message from the Apache Git Ser

[PR] Implement core::error::Error for ParserError and TokenizerError [datafusion-sqlparser-rs]

2026-01-31 Thread via GitHub
LucaCappelletti94 opened a new pull request, #2189: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2189 This PR updates ParserError and TokenizerError to implement core::error::Error instead of std::error::Error. This change enables the Error trait implementation for thes

Re: [PR] Improve sort-based shuffle: single spill file per partition and batch coalescing [datafusion-ballista]

2026-01-31 Thread via GitHub
milenkovicm commented on PR #1431: URL: https://github.com/apache/datafusion-ballista/pull/1431#issuecomment-3830540999 @sqlbenchmark run tpch -s 10 -i 3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Improve sort-based shuffle: single spill file per partition and batch coalescing [datafusion-ballista]

2026-01-31 Thread via GitHub
sqlbenchmark commented on PR #1431: URL: https://github.com/apache/datafusion-ballista/pull/1431#issuecomment-3830543603 ## @sqlbenchmark usage This bot is whitelisted for DataFusion committers. ### Commands ``` @sqlbenchmark tpch [--iterations N] [--scale-factor N] [

Re: [PR] Add criterion benchmarks for sort-based shuffle [datafusion-ballista]

2026-01-31 Thread via GitHub
andygrove commented on PR #1434: URL: https://github.com/apache/datafusion-ballista/pull/1434#issuecomment-3828839948 @sqlbenchmark criterion --bench sort_shuffle -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] feat: add ExpressionPlacement enum for optimizer expression placement decisions [datafusion]

2026-01-31 Thread via GitHub
adriangb commented on code in PR #20065: URL: https://github.com/apache/datafusion/pull/20065#discussion_r2749744963 ## datafusion/expr-common/src/placement.rs: ## @@ -0,0 +1,59 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] Improve sort-based shuffle: single spill file per partition and batch coalescing [datafusion-ballista]

2026-01-31 Thread via GitHub
andygrove commented on PR #1431: URL: https://github.com/apache/datafusion-ballista/pull/1431#issuecomment-3828840421 @sqlbenchmark criterion --bench sort_shuffle -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] feat: add ExpressionPlacement enum for optimizer expression placement decisions [datafusion]

2026-01-31 Thread via GitHub
adriangb commented on PR #20065: URL: https://github.com/apache/datafusion/pull/20065#issuecomment-3828933253 Thank you for the review and approval @comphead ! Will leave this open for a day or so for any additional feedback (in particular give @jackkleeman a chance to chime in on ht

Re: [PR] feat: [iceberg] allow native Iceberg scans with non-identity transform residuals [datafusion-comet]

2026-01-31 Thread via GitHub
Shekharrajak commented on PR #2948: URL: https://github.com/apache/datafusion-comet/pull/2948#issuecomment-3828937012 delete operation tests where failing so we are falling back to spark for that. Please trigger the workflow to validate now. -- This is an automated message from the Apac

Re: [PR] refactor: Change TableScan.projection from indices to expressions [datafusion]

2026-01-31 Thread via GitHub
adriangb commented on code in PR #20091: URL: https://github.com/apache/datafusion/pull/20091#discussion_r2749770625 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -2807,14 +2829,129 @@ impl TableScan { Ok(Self { table_name, source: table_s

Re: [PR] feat: add ExpressionPlacement enum for optimizer expression placement decisions [datafusion]

2026-01-31 Thread via GitHub
adriangb commented on code in PR #20065: URL: https://github.com/apache/datafusion/pull/20065#discussion_r2749753063 ## datafusion/expr-common/src/placement.rs: ## @@ -0,0 +1,59 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] perf: optimise right for byte access and StringView [datafusion]

2026-01-31 Thread via GitHub
theirix commented on code in PR #20069: URL: https://github.com/apache/datafusion/pull/20069#discussion_r2749774658 ## datafusion/functions/src/unicode/right.rs: ## @@ -119,58 +119,140 @@ impl ScalarUDFImpl for RightFunc { } } -/// Returns last n characters in the string

Re: [PR] fix: prost build keda and TLS RPC example [datafusion-ballista]

2026-01-31 Thread via GitHub
killzoner commented on code in PR #1429: URL: https://github.com/apache/datafusion-ballista/pull/1429#discussion_r2749853256 ## .github/actions/setup-builder/action.yaml: ## @@ -18,6 +18,14 @@ name: Prepare Rust Builder description: 'Prepare Rust Build Environment' inputs: +

Re: [PR] feat: Read sort-based shuffle spill files via stream [datafusion-ballista]

2026-01-31 Thread via GitHub
mattcuento closed pull request #1415: feat: Read sort-based shuffle spill files via stream URL: https://github.com/apache/datafusion-ballista/pull/1415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Add AI tooling disclosure text to contributor guide and fields to PR templates [datafusion]

2026-01-31 Thread via GitHub
milenkovicm commented on issue #18095: URL: https://github.com/apache/datafusion/issues/18095#issuecomment-3829054889 Should we limit use of AI for issues labelled as `good first issue`? IMHO, those type of issues should be done by humans who want to get into datafusion. -- This is an au

Re: [PR] fix: prost build keda and TLS RPC example [datafusion-ballista]

2026-01-31 Thread via GitHub
killzoner commented on code in PR #1429: URL: https://github.com/apache/datafusion-ballista/pull/1429#discussion_r2749866435 ## .github/actions/setup-builder/action.yaml: ## @@ -18,6 +18,14 @@ name: Prepare Rust Builder description: 'Prepare Rust Build Environment' inputs: +

Re: [I] [EPIC] Improve Comet Native writer [datafusion-comet]

2026-01-31 Thread via GitHub
coderfender commented on issue #2967: URL: https://github.com/apache/datafusion-comet/issues/2967#issuecomment-3829074454 Creating new issues and tagging in this EPIC to address failing spark tests -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] fix: use spark ParquetFilters [datafusion-comet]

2026-01-31 Thread via GitHub
mbutrovich closed pull request #2100: fix: use spark ParquetFilters URL: https://github.com/apache/datafusion-comet/pull/2100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] docs: Add contributor guide page for SQL file tests [datafusion-comet]

2026-01-31 Thread via GitHub
andygrove merged PR #: URL: https://github.com/apache/datafusion-comet/pull/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: optimize CASE WHEN for divide-by-zero protection pattern [datafusion]

2026-01-31 Thread via GitHub
pepijnve commented on PR #19994: URL: https://github.com/apache/datafusion/pull/19994#issuecomment-3829789029 In the microbenchmark, I think it might be preferable to use `!= 0` rather than `> 0`. With `> 0`, even for the '0% zeroes' benchmark, ~50% of the values are negative and do not mat

Re: [PR] bug: Fix string decimal type throw right exception [datafusion-comet]

2026-01-31 Thread via GitHub
coderfender commented on PR #3248: URL: https://github.com/apache/datafusion-comet/pull/3248#issuecomment-3829817962 Thank you for the approval @andygrove. Please merge whenever you get a chance @andygrove -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] bug: Fix string decimal type throw right exception [datafusion-comet]

2026-01-31 Thread via GitHub
andygrove merged PR #3248: URL: https://github.com/apache/datafusion-comet/pull/3248 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: Extract `execution_graph` to a trait [datafusion-ballista]

2026-01-31 Thread via GitHub
danielhumanmod commented on code in PR #1361: URL: https://github.com/apache/datafusion-ballista/pull/1361#discussion_r2750285664 ## ballista/scheduler/src/cluster/mod.rs: ## @@ -311,12 +313,15 @@ pub trait JobState: Send + Sync { /// /// The job may not belong to the

[PR] Improve performance of `CASE WHEN x THEN y ELSE NULL` expressions [datafusion]

2026-01-31 Thread via GitHub
pepijnve opened a new pull request, #20097: URL: https://github.com/apache/datafusion/pull/20097 ## Which issue does this PR close? - Related to #11570 ## Rationale for this change While reviewing #19994 it became clear the optimised `ExpressionOrExpression` code path wa

Re: [PR] feat: add ExpressionPlacement enum for optimizer expression placement decisions [datafusion]

2026-01-31 Thread via GitHub
comphead commented on code in PR #20065: URL: https://github.com/apache/datafusion/pull/20065#discussion_r2749738971 ## datafusion/expr-common/src/placement.rs: ## @@ -0,0 +1,59 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

[I] Add anchor links to individual config settings in Configuration Settings docs page [datafusion]

2026-01-31 Thread via GitHub
Omega359 opened a new issue, #20094: URL: https://github.com/apache/datafusion/issues/20094 ### Is your feature request related to a problem or challenge? It would be very useful to be able to have links directly to individual config settings for sending to others (such as coworkers)

Re: [I] Add anchor links to individual config settings in Configuration Settings docs page [datafusion]

2026-01-31 Thread via GitHub
Omega359 commented on issue #20094: URL: https://github.com/apache/datafusion/issues/20094#issuecomment-3828866811 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Implement preimage for floor function to enable predicate pushdown [datafusion]

2026-01-31 Thread via GitHub
comphead commented on PR #20059: URL: https://github.com/apache/datafusion/pull/20059#issuecomment-3828809831 @devanshu0987 please address changes for `debug_assert` and https://github.com/apache/datafusion/pull/20059#discussion_r2747443963 and I think the PR is good to go -- This is an

Re: [PR] feat: add ExpressionPlacement enum for optimizer expression placement decisions [datafusion]

2026-01-31 Thread via GitHub
adriangb commented on code in PR #20065: URL: https://github.com/apache/datafusion/pull/20065#discussion_r2749748658 ## datafusion/expr-common/src/placement.rs: ## @@ -0,0 +1,59 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] feat: add ExpressionPlacement enum for optimizer expression placement decisions [datafusion]

2026-01-31 Thread via GitHub
adriangb commented on code in PR #20065: URL: https://github.com/apache/datafusion/pull/20065#discussion_r2749749546 ## datafusion/expr-common/src/placement.rs: ## @@ -0,0 +1,59 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] Improve sort-based shuffle: single spill file per partition and batch coalescing [datafusion-ballista]

2026-01-31 Thread via GitHub
andygrove commented on PR #1431: URL: https://github.com/apache/datafusion-ballista/pull/1431#issuecomment-3828911163 @sqlbenchmark criterion --bench sort_shuffle -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Add criterion benchmarks for sort-based shuffle [datafusion-ballista]

2026-01-31 Thread via GitHub
andygrove commented on PR #1434: URL: https://github.com/apache/datafusion-ballista/pull/1434#issuecomment-3828911602 @sqlbenchmark criterion --bench sort_shuffle -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] chore: Migrate `concat` tests to sql based testing framework [datafusion-comet]

2026-01-31 Thread via GitHub
codecov-commenter commented on PR #3352: URL: https://github.com/apache/datafusion-comet/pull/3352#issuecomment-3829030511 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3352?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Add AI tooling disclosure text to contributor guide and fields to PR templates [datafusion]

2026-01-31 Thread via GitHub
milenkovicm commented on issue #18095: URL: https://github.com/apache/datafusion/issues/18095#issuecomment-3829033263 Looks like ffmpeg developers have similar issues https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21595#issuecomment-23753 -- This is an automated message from the Apache Git S

Re: [PR] Implement preimage for floor function to enable predicate pushdown [datafusion]

2026-01-31 Thread via GitHub
devanshu0987 commented on code in PR #20059: URL: https://github.com/apache/datafusion/pull/20059#discussion_r2750339228 ## datafusion/functions/src/math/floor.rs: ## @@ -200,7 +203,242 @@ impl ScalarUDFImpl for FloorFunc { Interval::make_unbounded(&data_type) }

Re: [PR] Add heap memory estimation for statistics [datafusion]

2026-01-31 Thread via GitHub
mkleen commented on PR #19599: URL: https://github.com/apache/datafusion/pull/19599#issuecomment-3829312819 @adriangb Thanks for the feedback. What are your thoughts on @alchemist51’s suggestion to move this into the cache folder, since it will only be used there? -- This is an automated

Re: [PR] feat: add ExpressionPlacement enum for optimizer expression placement decisions [datafusion]

2026-01-31 Thread via GitHub
comphead commented on code in PR #20065: URL: https://github.com/apache/datafusion/pull/20065#discussion_r2749738971 ## datafusion/expr-common/src/placement.rs: ## @@ -0,0 +1,59 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

[PR] Add anchor links to individual config settings in Configuration Settings docs page [datafusion]

2026-01-31 Thread via GitHub
Omega359 opened a new pull request, #20095: URL: https://github.com/apache/datafusion/pull/20095 ## Which issue does this PR close? - Closes #20094 ## Rationale for this change direct links to config items ## What changes are included in this PR? doc

Re: [PR] feat: add ExpressionPlacement enum for optimizer expression placement decisions [datafusion]

2026-01-31 Thread via GitHub
adriangb commented on code in PR #20065: URL: https://github.com/apache/datafusion/pull/20065#discussion_r2749769487 ## datafusion/expr-common/src/placement.rs: ## @@ -0,0 +1,59 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] feat: add ExpressionPlacement enum for optimizer expression placement decisions [datafusion]

2026-01-31 Thread via GitHub
adriangb commented on code in PR #20065: URL: https://github.com/apache/datafusion/pull/20065#discussion_r2749731114 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -702,6 +702,11 @@ impl CSEController for ExprCSEController<'_> { #[expect(deprecated)]

Re: [PR] feat: add ExpressionPlacement enum for optimizer expression placement decisions [datafusion]

2026-01-31 Thread via GitHub
comphead commented on code in PR #20065: URL: https://github.com/apache/datafusion/pull/20065#discussion_r2749751391 ## datafusion/expr-common/src/placement.rs: ## @@ -0,0 +1,59 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] chore: Adapt caching from #3251 to [iceberg] workflows [datafusion-comet]

2026-01-31 Thread via GitHub
codecov-commenter commented on PR #3353: URL: https://github.com/apache/datafusion-comet/pull/3353#issuecomment-3829248864 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3353?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add criterion benchmarks for sort-based shuffle [datafusion-ballista]

2026-01-31 Thread via GitHub
andygrove commented on PR #1434: URL: https://github.com/apache/datafusion-ballista/pull/1434#issuecomment-3829250402 @sqlbenchmark criterion --bench sort_shuffle -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Timezone aware extract SQL expression [datafusion]

2026-01-31 Thread via GitHub
Omega359 commented on code in PR #18990: URL: https://github.com/apache/datafusion/pull/18990#discussion_r2750083691 ## datafusion/functions/src/datetime/common.rs: ## @@ -23,22 +23,89 @@ use arrow::array::{ StringArrayType, StringViewArray, }; use arrow::compute::Decimal

Re: [PR] Fix parsing of :: cast after parenthesized DEFAULT expression [datafusion-sqlparser-rs]

2026-01-31 Thread via GitHub
isaacparker0 commented on code in PR #2168: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2168#discussion_r2750485742 ## src/parser/mod.rs: ## @@ -9002,7 +9002,15 @@ impl<'a> Parser<'a> { /// [ColumnOption::NotNull]. fn parse_column_option_expr(&mut self

Re: [I] Binary string (`BYTEA`, `Binary`) concatenation [datafusion]

2026-01-31 Thread via GitHub
devanshu0987 commented on issue #12709: URL: https://github.com/apache/datafusion/issues/12709#issuecomment-3830299845 > One thing to keep in mind is how we handle binary || string concat (i.e. which common type we coerce to); we can't assume all binary is a valid string, but strings are al

Re: [PR] Fix parsing of :: cast after parenthesized DEFAULT expression [datafusion-sqlparser-rs]

2026-01-31 Thread via GitHub
isaacparker0 commented on code in PR #2168: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2168#discussion_r2750486616 ## tests/sqlparser_common.rs: ## @@ -17376,6 +17302,11 @@ fn test_parse_not_null_in_column_options() { ); } +#[test] +fn test_parse_defaul

Re: [PR] MSSQL: Support standalone BEGIN...END blocks [datafusion-sqlparser-rs]

2026-01-31 Thread via GitHub
guan404ming commented on code in PR #2186: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2186#discussion_r2749179803 ## src/dialect/mssql.rs: ## @@ -145,7 +145,22 @@ impl Dialect for MsSqlDialect { } fn parse_statement(&self, parser: &mut Parser) -> Op

Re: [PR] MSSQL: Support standalone BEGIN...END blocks [datafusion-sqlparser-rs]

2026-01-31 Thread via GitHub
guan404ming commented on code in PR #2186: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2186#discussion_r2749179803 ## src/dialect/mssql.rs: ## @@ -145,7 +145,22 @@ impl Dialect for MsSqlDialect { } fn parse_statement(&self, parser: &mut Parser) -> Op

Re: [PR] MSSQL: Support standalone BEGIN...END blocks [datafusion-sqlparser-rs]

2026-01-31 Thread via GitHub
guan404ming commented on code in PR #2186: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2186#discussion_r2749180710 ## src/dialect/mssql.rs: ## @@ -145,7 +145,22 @@ impl Dialect for MsSqlDialect { } fn parse_statement(&self, parser: &mut Parser) -> Op

[PR] feat(python): Improve Jupyter notebook support with SQL magic commands and examples [datafusion-ballista]

2026-01-31 Thread via GitHub
littleKitchen opened a new pull request, #1430: URL: https://github.com/apache/datafusion-ballista/pull/1430 ## Summary This PR implements the improvements outlined in #1398 to enhance the Jupyter notebook experience for Ballista. ## Implementation Checklist All items fr

Re: [PR] [DRAFT] Extension Type Registry Draft [datafusion]

2026-01-31 Thread via GitHub
tobixdev commented on PR #18552: URL: https://github.com/apache/datafusion/pull/18552#issuecomment-3828039403 Also a big thanks from my side! I am still on vacation next week but happy to further this project afterwards. Meanwhile, if you're motivated to work on this in the meantime

Re: [PR] [Oracle] Support hierarchical queries [datafusion-sqlparser-rs]

2026-01-31 Thread via GitHub
xitep commented on code in PR #2185: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2185#discussion_r2749282720 ## src/parser/mod.rs: ## @@ -14185,25 +14185,35 @@ impl<'a> Parser<'a> { /// Parse a `CONNECT BY` clause (Oracle-style hierarchical query support)

Re: [PR] [Oracle] Support hierarchical queries [datafusion-sqlparser-rs]

2026-01-31 Thread via GitHub
xitep commented on code in PR #2185: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2185#discussion_r2749282720 ## src/parser/mod.rs: ## @@ -14185,25 +14185,35 @@ impl<'a> Parser<'a> { /// Parse a `CONNECT BY` clause (Oracle-style hierarchical query support)

[I] Add sqllogictest coverage for UDWF return types in information_schema [datafusion]

2026-01-31 Thread via GitHub
AndreaBozzo opened a new issue, #20090: URL: https://github.com/apache/datafusion/issues/20090 ### Context In #20079, the `information_schema` was updated to use `return_field_from_args` / `return_field` / `WindowUDFFieldArgs::field` instead of the older `return_type` API for UDFs, U

Re: [PR] Use return_field_from_args in information schema and date_trunc [datafusion]

2026-01-31 Thread via GitHub
AndreaBozzo commented on code in PR #20079: URL: https://github.com/apache/datafusion/pull/20079#discussion_r2749329461 ## datafusion/catalog/src/information_schema.rs: ## @@ -473,12 +500,26 @@ fn get_udwf_args_and_return_types( Ok(arg_types .into_iter()

Re: [I] Add interactive TUI (Text User Interface) for cluster monitoring [datafusion-ballista]

2026-01-31 Thread via GitHub
martin-g commented on issue #1396: URL: https://github.com/apache/datafusion-ballista/issues/1396#issuecomment-3828203047 I already have a working app with the Dashboard as a separate crate - ballista-tui, but I will merge it into the ballista-cli crate! -- This is an automated message f

Re: [I] Add interactive TUI (Text User Interface) for cluster monitoring [datafusion-ballista]

2026-01-31 Thread via GitHub
milenkovicm commented on issue #1396: URL: https://github.com/apache/datafusion-ballista/issues/1396#issuecomment-3828232205 Thanks a lot. That would save us another crate to release! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] TPC-DS query 76 becomes much slower as target_partitions goes up [datafusion]

2026-01-31 Thread via GitHub
Dandandan commented on issue #20078: URL: https://github.com/apache/datafusion/issues/20078#issuecomment-3828497932 FYI @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] TPC-DS query 76 becomes much slower as target_partitions goes up [datafusion]

2026-01-31 Thread via GitHub
Dandandan commented on issue #20078: URL: https://github.com/apache/datafusion/issues/20078#issuecomment-3828501668 Apart from making things more efficient, it might be a good idea to adjust the `target_partitions` for each node based on the input size (now it's either no repartition for sm

Re: [PR] Add `truncated_rows` parameter to `register_csv()` and `read_csv()` [datafusion-python]

2026-01-31 Thread via GitHub
timsaucer commented on PR #1359: URL: https://github.com/apache/datafusion-python/pull/1359#issuecomment-3828504117 > @timsaucer i am not comfortable yet with this whole thing, I know what you want :) and it make perfect sense, but i don't want to get too excited and do silly thing yet :)

Re: [PR] Add heap memory estimation for statistics [datafusion]

2026-01-31 Thread via GitHub
adriangb commented on code in PR #19599: URL: https://github.com/apache/datafusion/pull/19599#discussion_r2678644702 ## datafusion/common/src/heap_size.rs: ## @@ -0,0 +1,454 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreeme

[PR] Ballista Text User Interface app [datafusion-ballista]

2026-01-31 Thread via GitHub
martin-g opened a new pull request, #1433: URL: https://github.com/apache/datafusion-ballista/pull/1433 # Which issue does this PR close? Part of #1396. # Rationale for this change See #1396 # What changes are included in this PR? Initial version of the TUI

Re: [I] Use interleave_record_batch to avoid tiny batches in sort-based shuffle [datafusion-ballista]

2026-01-31 Thread via GitHub
Dandandan commented on issue #1432: URL: https://github.com/apache/datafusion-ballista/issues/1432#issuecomment-3828441839 https://docs.rs/arrow/latest/arrow/compute/struct.BatchCoalescer.html#method.push_batch_with_indices -- This is an automated message from the Apache Git Service. To

Re: [I] Use interleave_record_batch to avoid tiny batches in sort-based shuffle [datafusion-ballista]

2026-01-31 Thread via GitHub
Dandandan commented on issue #1432: URL: https://github.com/apache/datafusion-ballista/issues/1432#issuecomment-3828438963 `interleave` is relatively slow compared to `coalesce` for in order results. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] refactor: Rename `FileSource::try_reverse_output` to `FileSource::try_pushdown_sort` [datafusion]

2026-01-31 Thread via GitHub
adriangb commented on code in PR #20043: URL: https://github.com/apache/datafusion/pull/20043#discussion_r2749546516 ## datafusion/datasource/src/file.rs: ## @@ -189,7 +189,29 @@ pub trait FileSource: Send + Sync { /// * `Inexact` - Created a source optimized for ordering (

Re: [PR] refactor: Rename `FileSource::try_reverse_output` to `FileSource::try_pushdown_sort` [datafusion]

2026-01-31 Thread via GitHub
kumarUjjawal commented on code in PR #20043: URL: https://github.com/apache/datafusion/pull/20043#discussion_r2749551072 ## datafusion/datasource/src/file.rs: ## @@ -189,7 +189,29 @@ pub trait FileSource: Send + Sync { /// * `Inexact` - Created a source optimized for orderi

[PR] refactor: Change TableScan.projection from indices to expressions [datafusion]

2026-01-31 Thread via GitHub
adriangb opened a new pull request, #20091: URL: https://github.com/apache/datafusion/pull/20091 ## Motivation Currently, `TableScan` stores projections as column indices (`Option>`) which requires constant conversion between indices and expressions throughout the codebase. By storin

Re: [PR] Ballista Text User Interface app [datafusion-ballista]

2026-01-31 Thread via GitHub
martin-g commented on PR #1433: URL: https://github.com/apache/datafusion-ballista/pull/1433#issuecomment-3828534968 The Dashboard view: https://github.com/user-attachments/assets/767bdc76-b35a-4f15-a5f3-baf0646673e3"; /> The help popup: https://github.com/user-attachments/asse

Re: [PR] chore(deps): bump object_store from 0.12.5 to 0.13.1 in /native [datafusion-comet]

2026-01-31 Thread via GitHub
mbutrovich closed pull request #3283: chore(deps): bump object_store from 0.12.5 to 0.13.1 in /native URL: https://github.com/apache/datafusion-comet/pull/3283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] chore(deps): bump object_store from 0.12.5 to 0.13.1 in /native [datafusion-comet]

2026-01-31 Thread via GitHub
dependabot[bot] commented on PR #3283: URL: https://github.com/apache/datafusion-comet/pull/3283#issuecomment-3828538291 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor ve

Re: [PR] Use return_field_from_args in information schema and date_trunc [datafusion]

2026-01-31 Thread via GitHub
AndreaBozzo commented on code in PR #20079: URL: https://github.com/apache/datafusion/pull/20079#discussion_r2749593085 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -223,27 +223,21 @@ impl ScalarUDFImpl for DateTruncFunc { &self.signature } -// k

Re: [PR] Automatically generate examples documentation adv (#19294) [datafusion]

2026-01-31 Thread via GitHub
Jefffrey commented on PR #19750: URL: https://github.com/apache/datafusion/pull/19750#issuecomment-3828588451 Thanks @cj-zhukov -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Automatically generate examples documentation adv (#19294) [datafusion]

2026-01-31 Thread via GitHub
Jefffrey merged PR #19750: URL: https://github.com/apache/datafusion/pull/19750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Use return_field_from_args in information schema and date_trunc [datafusion]

2026-01-31 Thread via GitHub
martin-g commented on code in PR #20079: URL: https://github.com/apache/datafusion/pull/20079#discussion_r2749573305 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -223,27 +223,21 @@ impl ScalarUDFImpl for DateTruncFunc { &self.signature } -// keep

[PR] Consolidate filter classification into physical planner [datafusion]

2026-01-31 Thread via GitHub
adriangb opened a new pull request, #20092: URL: https://github.com/apache/datafusion/pull/20092 ## Which issue does this PR close? Related to: - https://github.com/apache/datafusion/issues/19894 - Unified `TableScan.filters` representation - https://github.com/apache/datafusion

Re: [I] TPC-DS query 76 becomes much slower as target_partitions goes up [datafusion]

2026-01-31 Thread via GitHub
adriangb commented on issue #20078: URL: https://github.com/apache/datafusion/issues/20078#issuecomment-3828565256 Another thing we can consider is some sort of cache for expression simplification. In particular given the same input expression, same physical file layout, etc. both `Physical

[PR] Refactor i`szero()` and `isnan()` to accept all numeric types [datafusion]

2026-01-31 Thread via GitHub
kumarUjjawal opened a new pull request, #20093: URL: https://github.com/apache/datafusion/pull/20093 ## Which issue does this PR close? - Closes #20089 ## Rationale for this change iszero() and isnan() previously accepted “numeric” inputs by implicitly coercing t

Re: [PR] Refactor `iszero()` and `isnan()` to accept all numeric types [datafusion]

2026-01-31 Thread via GitHub
Jefffrey commented on PR #20093: URL: https://github.com/apache/datafusion/pull/20093#issuecomment-3828622953 For decimal is_zero we need to consider the scale E.g. negative scale can never be zero -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] Add heap memory estimation for statistics [datafusion]

2026-01-31 Thread via GitHub
mkleen commented on code in PR #19599: URL: https://github.com/apache/datafusion/pull/19599#discussion_r2749268163 ## datafusion/common/src/heap_size.rs: ## @@ -0,0 +1,454 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

Re: [I] Support `EXPLAIN ANALYZE` in Ballista [datafusion-ballista]

2026-01-31 Thread via GitHub
milenkovicm commented on issue #1344: URL: https://github.com/apache/datafusion-ballista/issues/1344#issuecomment-3827939026 Thanks @danielhumanmod will have a look asap -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Add interactive TUI (Text User Interface) for cluster monitoring [datafusion-ballista]

2026-01-31 Thread via GitHub
milenkovicm commented on issue #1396: URL: https://github.com/apache/datafusion-ballista/issues/1396#issuecomment-3827946592 Can we make it as part of ballista cli? We could have special command to enter tui mode. This would simplify release as well as we deliver one single binary

Re: [I] Optimize spark sha2 [datafusion]

2026-01-31 Thread via GitHub
Jefffrey commented on issue #20046: URL: https://github.com/apache/datafusion/issues/20046#issuecomment-3827945085 Not for the Spark version -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] fix(expr): coerce literal arguments in return_field_from_args for UDFs [datafusion]

2026-01-31 Thread via GitHub
Trikooo commented on code in PR #20012: URL: https://github.com/apache/datafusion/pull/20012#discussion_r2749230588 ## datafusion/expr/src/expr_schema.rs: ## @@ -598,13 +598,32 @@ impl ExprSchemable for Expr { ) })?; -

Re: [I] More specific `COLLATE` enum [datafusion-sqlparser-rs]

2026-01-31 Thread via GitHub
LucaCappelletti94 closed issue #2159: More specific `COLLATE` enum URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2159 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Use return_field_from_args in information schema and date_trunc [datafusion]

2026-01-31 Thread via GitHub
Jefffrey commented on code in PR #20079: URL: https://github.com/apache/datafusion/pull/20079#discussion_r2749229459 ## datafusion/catalog/src/information_schema.rs: ## @@ -473,12 +500,26 @@ fn get_udwf_args_and_return_types( Ok(arg_types .into_iter()

Re: [PR] Fix parsing of :: cast after parenthesized DEFAULT expression [datafusion-sqlparser-rs]

2026-01-31 Thread via GitHub
iffyio commented on code in PR #2168: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2168#discussion_r2749386183 ## src/parser/mod.rs: ## @@ -9002,7 +9002,15 @@ impl<'a> Parser<'a> { /// [ColumnOption::NotNull]. fn parse_column_option_expr(&mut self) -> R

Re: [PR] Add `truncated_rows` parameter to `register_csv()` and `read_csv()` [datafusion-python]

2026-01-31 Thread via GitHub
djouallah commented on PR #1359: URL: https://github.com/apache/datafusion-python/pull/1359#issuecomment-3828144437 @timsaucer i am not comfortable yet with this whole thing, I know what you want :) and it make perfect sense, but i don't want to get too excited and do silly thing yet :)

[PR] Moved more structs outside of Statement to facilitate reuse [datafusion-sqlparser-rs]

2026-01-31 Thread via GitHub
LucaCappelletti94 opened a new pull request, #2188: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2188 This PR refactors several `Statement` enum variants into their own dedicated structs. This follows the pattern of recent refactors to improve the modularity and type safety o

Re: [PR] [Oracle] Support hierarchical queries [datafusion-sqlparser-rs]

2026-01-31 Thread via GitHub
xitep commented on code in PR #2185: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2185#discussion_r2749282720 ## src/parser/mod.rs: ## @@ -14185,25 +14185,35 @@ impl<'a> Parser<'a> { /// Parse a `CONNECT BY` clause (Oracle-style hierarchical query support)

Re: [PR] Add `truncated_rows` parameter to `register_csv()` and `read_csv()` [datafusion-python]

2026-01-31 Thread via GitHub
timsaucer commented on PR #1359: URL: https://github.com/apache/datafusion-python/pull/1359#issuecomment-3828138464 Thank you for the PR. How would you feel about making a more general solution as described in #1358 ? If we're updating this, we could ensure we have all of the options expos

[PR] fix: use single spill file per partition in sort-based shuffle [datafusion-ballista]

2026-01-31 Thread via GitHub
andygrove opened a new pull request, #1431: URL: https://github.com/apache/datafusion-ballista/pull/1431 ## Summary - Keeps one `StreamWriter` open per output partition in `SpillManager`, appending across multiple spill calls instead of creating a new file each time - Reduces file

[I] Use interleave_record_batch to avoid tiny batches in sort-based shuffle [datafusion-ballista]

2026-01-31 Thread via GitHub
andygrove opened a new issue, #1432: URL: https://github.com/apache/datafusion-ballista/issues/1432 ## Problem The current sort-based shuffle writer uses DataFusion's `BatchPartitioner::partition()` which calls `take_arrays()` to split each input batch into per-partition sub-batches.

Re: [PR] Automatically generate examples documentation adv (#19294) [datafusion]

2026-01-31 Thread via GitHub
cj-zhukov commented on PR #19750: URL: https://github.com/apache/datafusion/pull/19750#issuecomment-3828421841 @Jefffrey I've resolved the conflicts -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Improve sort-based shuffle: single spill file per partition and batch coalescing [datafusion-ballista]

2026-01-31 Thread via GitHub
Dandandan commented on code in PR #1431: URL: https://github.com/apache/datafusion-ballista/pull/1431#discussion_r2749531369 ## ballista/core/src/execution_plans/sort_shuffle/buffer.rs: ## @@ -110,6 +111,77 @@ impl PartitionBuffer { pub fn take_batches(&mut self) -> Vec {

Re: [PR] feat: add ExpressionPlacement enum for optimizer expression placement decisions [datafusion]

2026-01-31 Thread via GitHub
comphead commented on code in PR #20065: URL: https://github.com/apache/datafusion/pull/20065#discussion_r2749731366 ## datafusion/expr-common/src/placement.rs: ## @@ -0,0 +1,59 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] feat: optimize CASE WHEN for divide-by-zero protection pattern [datafusion]

2026-01-31 Thread via GitHub
CuteChuanChuan commented on code in PR #19994: URL: https://github.com/apache/datafusion/pull/19994#discussion_r2749732276 ## datafusion/physical-expr/benches/case_when.rs: ## @@ -517,5 +519,106 @@ fn benchmark_lookup_table_case_when(c: &mut Criterion, batch_size: usize) {

  1   2   3   >