[PR] Provide more generic API for the capacity limits [datafusion]

2026-02-15 Thread via GitHub
erenavsarogullari opened a new pull request, #20372: URL: https://github.com/apache/datafusion/pull/20372 ## Which issue does this PR close? - Closes #20371. ## Rationale for this change Currently, `datafusion.runtime.max_temp_directory_size` is a disk based config but when it i

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-15 Thread via GitHub
Dandandan commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3905133276 I think you're not 100% following my point, but not sure: * I believe TPCH / TPCDS (looking locally) it the tables are I think are generated based on number of CPU cores

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
alamb-ghbot commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3905166961 🤖: Benchmark completed Details ``` Comparing HEAD and filter-pushdown-dynamic-bytes Benchmark tpch_sf1.json -

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3905179031 But to be clear, I think a version that is at least able to switch the approach mid scan is strictly better. But since that requires a API changes and coordination across crate

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3905692013 show benchmark queue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] fix: unify ordering display with optimization path [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on PR #20362: URL: https://github.com/apache/datafusion/pull/20362#issuecomment-3905689818 @zhuqi-lucas could you review this change please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
alamb-ghbot commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3905692963 🤖 Hi @adriangb, you asked to view the benchmark queue (https://github.com/apache/datafusion/pull/20363#issuecomment-3905692013). No pending jobs in `jobs/`. -- This is a

Re: [PR] perf: optimize `array_distinct` with batched row conversion [datafusion]

2026-02-15 Thread via GitHub
lyne7-sc commented on code in PR #20364: URL: https://github.com/apache/datafusion/pull/20364#discussion_r2810352004 ## datafusion/functions-nested/src/set_ops.rs: ## @@ -527,42 +531,52 @@ fn general_array_distinct( if array.is_empty() { return Ok(Arc::new(array.cl

Re: [I] Improve performance of `array_has` [datafusion]

2026-02-15 Thread via GitHub
neilconway commented on issue #18181: URL: https://github.com/apache/datafusion/issues/18181#issuecomment-3906058209 The repro actually uses `array_has_any`, not `array_has`. Can we rename the title of this ticket for clarity? -- This is an automated message from the Apache Git Service. T

Re: [I] Optimize `array_has()` for scalar needle [datafusion]

2026-02-15 Thread via GitHub
neilconway commented on issue #20377: URL: https://github.com/apache/datafusion/issues/20377#issuecomment-3906062707 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[I] Optimize `array_has()` for scalar needle [datafusion]

2026-02-15 Thread via GitHub
neilconway opened a new issue, #20377: URL: https://github.com/apache/datafusion/issues/20377 ### Is your feature request related to a problem or challenge? For the scalar needle case, array_has is reasonably fast but it could be optimized by avoiding the per-row work that is currentl

Re: [I] Optimize `array_has()` for scalar needle [datafusion]

2026-02-15 Thread via GitHub
neilconway commented on issue #20377: URL: https://github.com/apache/datafusion/issues/20377#issuecomment-3906064528 See also #18181, although in that case the observed performance issue is actually with `array_has_any`. -- This is an automated message from the Apache Git Service. To resp

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-15 Thread via GitHub
Dandandan commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3905199459 Ah ok - yeah that makes sense 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[PR] Set expected runtime config in error message when the used disk space during the spilling process has exceeded the allocation limit [datafusion]

2026-02-15 Thread via GitHub
erenavsarogullari opened a new pull request, #20375: URL: https://github.com/apache/datafusion/pull/20375 ## Which issue does this PR close? - Closes 20373. ## Rationale for this change Minor refactoring on error message by exposing required config name for the end user. This is

Re: [PR] perf: Optimize scalar fast path of atan2 [datafusion]

2026-02-15 Thread via GitHub
kumarUjjawal commented on code in PR #20336: URL: https://github.com/apache/datafusion/pull/20336#discussion_r2810983428 ## datafusion/functions/src/macros.rs: ## @@ -393,37 +394,76 @@ macro_rules! make_math_binary_udf { &self, args: Sca

[PR] [Minor] Update object_store to 0.12.5 [datafusion]

2026-02-15 Thread via GitHub
Dandandan opened a new pull request, #20378: URL: https://github.com/apache/datafusion/pull/20378 ## Which issue does this PR close? ## Rationale for this change Keep up to date. I saw when looking at https://github.com/apache/datafusion/issues/20325 we were still at 0.12.4

Re: [PR] [Minor] Update object_store to 0.12.5 [datafusion]

2026-02-15 Thread via GitHub
Dandandan commented on PR #20378: URL: https://github.com/apache/datafusion/pull/20378#issuecomment-3906877330 run benchmark clickbench_partitioned -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [Minor] Update object_store to 0.12.5 [datafusion]

2026-02-15 Thread via GitHub
Dandandan commented on PR #20378: URL: https://github.com/apache/datafusion/pull/20378#issuecomment-3906878700 run benchmark clickbench_partitioned DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true -- This is an automated mes

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3905154744 I do get your point. TPCH / TPCDS will essentially not use late materialization off/ `RowFilter` because like you say all files are opened at once. > Because a disabled

Re: [PR] [datafusion-spark] Implement map function [datafusion]

2026-02-15 Thread via GitHub
xanderbailey commented on PR #20358: URL: https://github.com/apache/datafusion/pull/20358#issuecomment-3905215154 Warning that spark has `spark.sql.mapKeyDedupPolicy` ``` spark.sql.mapKeyDedupPolicy | EXCEPTION | The policy to deduplicate map keys in builtin function: CreateMap, M

Re: [PR] feat: Add spark compatible `MapSort` function along with limited support for grouping on Map type [datafusion-comet]

2026-02-15 Thread via GitHub
github-actions[bot] commented on PR #2221: URL: https://github.com/apache/datafusion-comet/pull/2221#issuecomment-3906107507 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or commen

Re: [PR] feat: Add JNI-based Hadoop FileSystem support for S3 and other Hadoop-compatible stores [datafusion-comet]

2026-02-15 Thread via GitHub
github-actions[bot] commented on PR #1992: URL: https://github.com/apache/datafusion-comet/pull/1992#issuecomment-3906107575 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or commen

Re: [PR] feat: [1941-Part2]: Introduce map_to_list scalar function [datafusion-comet]

2026-02-15 Thread via GitHub
github-actions[bot] commented on PR #2312: URL: https://github.com/apache/datafusion-comet/pull/2312#issuecomment-3906107436 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or commen

Re: [PR] fix: incorrect results when using NOT physical expression in physical… [datafusion]

2026-02-15 Thread via GitHub
evangelisilva commented on PR #20138: URL: https://github.com/apache/datafusion/pull/20138#issuecomment-3905070967 @berkaysynnada I have updated the PR based on your feedback: * **Fixed Formatting**: Ran `cargo fmt` to ensure everything aligns with the project's style. * **Clea

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-15 Thread via GitHub
Dandandan commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3905071439 See https://github.com/apache/datafusion/pull/20160#issuecomment-3905053370 I think approaches to adaptiveness/selectivity tracking also need to work _during_ file scan

Re: [PR] Validate coerce int96 config 17498 [datafusion]

2026-02-15 Thread via GitHub
AlyAbdelmoneim commented on PR #20253: URL: https://github.com/apache/datafusion/pull/20253#issuecomment-3906679565 Hi @Jefffrey, the test failure I’m seeing doesn’t seem related to my changes, it still occurs on the `main` branch. You can reproduce it by running: ```bash cargo

Re: [PR] perf: optimize `array_distinct` with batched row conversion [datafusion]

2026-02-15 Thread via GitHub
Dandandan commented on code in PR #20364: URL: https://github.com/apache/datafusion/pull/20364#discussion_r2809091594 ## datafusion/functions-nested/src/set_ops.rs: ## @@ -527,42 +531,52 @@ fn general_array_distinct( if array.is_empty() { return Ok(Arc::new(array.c

Re: [PR] Add a memory bound FileStatisticsCache for the Listing Table [datafusion]

2026-02-15 Thread via GitHub
mkleen commented on PR #20047: URL: https://github.com/apache/datafusion/pull/20047#issuecomment-3904420250 @nuno-faria I really appreciate the thorough feedback — it’s very helpful. I’ll dig into this. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-15 Thread via GitHub
alamb commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3904422160 > Maybe I'm just being biased here but I personally think the expensive to evaluate but not helpful join dynamic filters are just a pathological case of "expensive low selectivity

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-15 Thread via GitHub
alamb commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3904425520 Thinking more about this, I wonder if we could model the choice of where to evaluate a filter as a dynamic filter 🤔 Aka make two filters for each predicate * The one in

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3904432303 > Thinking more about this, I wonder if we could model the choice of where to evaluate a filter as a dynamic filter 🤔 > > Aka make two filters for each predicate >

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3904434055 > > Maybe I'm just being biased here but I personally think the expensive to evaluate but not helpful join dynamic filters are just a pathological case of "expensive low select

Re: [I] Incorrect cast of integer columns to utf8 when comparing with utf8 constant [datafusion]

2026-02-15 Thread via GitHub
AlonSpivack commented on issue #15161: URL: https://github.com/apache/datafusion/issues/15161#issuecomment-3904440723 I'm hitting this bug in production on DataFusion v52, and it's causing silently incorrect query results across multiple scenarios. I want to add important context beyond

Re: [I] Avoid recompute CTEs (common table expressions) / share input plans [datafusion]

2026-02-15 Thread via GitHub
suibianwanwank commented on issue #8777: URL: https://github.com/apache/datafusion/issues/8777#issuecomment-3904440991 > Yes. Just restating the deadlock concern from the issue description. I generated this diagram to show the problem - if the probe side buffer is full, the "FanoutExec" is

Re: [I] Support sqllogictest output coloring [datafusion]

2026-02-15 Thread via GitHub
theirix commented on issue #20367: URL: https://github.com/apache/datafusion/issues/20367#issuecomment-390713 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
Dandandan commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3904523263 > Wonder if I'm infinite looping it or something :( Yes I think previously it got stuck during infinite loops / extremely long running tasks. -- This is an automated messa

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3904526739 > > Wonder if I'm infinite looping it or something :( > > Yes I think previously it got stuck during infinite loops / extremely long running tasks. My bad I’ll try to a

[PR] feat: support sqllogictest output coloring [datafusion]

2026-02-15 Thread via GitHub
theirix opened a new pull request, #20368: URL: https://github.com/apache/datafusion/pull/20368 ## Which issue does this PR close? - Closes #20367. ## Rationale for this change It's more ergonomic to have colored diffs in sqllogictest's output. The upstream librar

[I] Support sqllogictest output coloring [datafusion]

2026-02-15 Thread via GitHub
theirix opened a new issue, #20367: URL: https://github.com/apache/datafusion/issues/20367 ### Is your feature request related to a problem or challenge? It would be nice to have colored diffs in sqllogictest's output - easier to see differences. ### Describe the solution you'd

Re: [PR] fix: handle Utf8View and LargeUtf8 separators in concat_ws [datafusion]

2026-02-15 Thread via GitHub
neilconway commented on code in PR #20361: URL: https://github.com/apache/datafusion/pull/20361#discussion_r2809204758 ## datafusion/functions/src/string/concat_ws.rs: ## @@ -546,4 +564,41 @@ mod tests { Ok(()) } + +#[test] +fn concat_ws_utf8view_scalar_s

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
Dandandan commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3904317167 FYI @alamb > Hm it seems stuck again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3904383525 @Dandandan this is mostly vibe coded, I'm only 50% confident it even makes sense without reviewing the code fwiw -- This is an automated message from the Apache Git Service. To re

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3904395716 > Make dynamic filters pruning-only for the moment (behind a flag) and only push down static filters to the parquet reader (i.e. await results here https://github.com/apache/da

Re: [PR] Add a memory bound FileStatisticsCache for the Listing Table [datafusion]

2026-02-15 Thread via GitHub
nuno-faria commented on code in PR #20047: URL: https://github.com/apache/datafusion/pull/20047#discussion_r2809240718 ## datafusion-cli/src/main.rs: ## @@ -647,9 +644,9 @@ mod tests { +---+-+-+--+

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3904489674 show benchmark queue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3904490794 Wonder if I'm infinite looping it or something :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
alamb-ghbot commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3904489818 🤖 Hi @adriangb, you asked to view the benchmark queue (https://github.com/apache/datafusion/pull/20363#issuecomment-3904489674). | Job | User | Benchmarks | Comment | |

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-15 Thread via GitHub
Dandandan commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3904610347 > Thinking more about this, I wonder if we could model the choice of where to evaluate a filter as a dynamic filter 🤔 > > Aka make two filters for each predicate >

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
Dandandan commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3903918629 show benchmark queue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
alamb-ghbot commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3903918963 🤖 Hi @Dandandan, you asked to view the benchmark queue (https://github.com/apache/datafusion/pull/20363#issuecomment-3903918629). | Job | User | Benchmarks | Comment | |

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
Dandandan commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3903920732 Hm it seems stuck again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] Refactor ordered-set aggregate Dataframe APIs to align with SQL(#18279) [datafusion]

2026-02-15 Thread via GitHub
cj-zhukov opened a new pull request, #20366: URL: https://github.com/apache/datafusion/pull/20366 ## Which issue does this PR close? - Closes #https://github.com/apache/datafusion/issues/18279. ## Rationale for this change ## What changes are included in t

Re: [PR] Refactor ordered-set aggregate Dataframe APIs to align with SQL(#18279) [datafusion]

2026-02-15 Thread via GitHub
cj-zhukov commented on PR #20366: URL: https://github.com/apache/datafusion/pull/20366#issuecomment-3903865617 ### High-Level Overview This PR refactors the three percentile functions: - `percentile_cont` - `approx_percentile_cont` - `approx_percentile_cont_with_weight` Ch

Re: [PR] fix: handle Utf8View and LargeUtf8 separators in concat_ws [datafusion]

2026-02-15 Thread via GitHub
xanderbailey commented on code in PR #20361: URL: https://github.com/apache/datafusion/pull/20361#discussion_r2809008987 ## datafusion/sqllogictest/test_files/expr.slt: ## @@ -504,6 +504,19 @@ abc statement ok drop table foo +# concat_ws with a Utf8View column as separator +

Re: [PR] perf: Optimize concat() UDF [datafusion]

2026-02-15 Thread via GitHub
neilconway commented on PR #20317: URL: https://github.com/apache/datafusion/pull/20317#issuecomment-3904688713 @Jefffrey Is this okay to land in `main`, do you think? Lmk if you have other feedback or concerns. -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] perf: Optimize lpad, rpad for ASCII strings [datafusion]

2026-02-15 Thread via GitHub
neilconway commented on PR #20278: URL: https://github.com/apache/datafusion/pull/20278#issuecomment-3904687970 @martin-g Is this okay to land in `main`, do you think? Lmk if you have other feedback or concerns. -- This is an automated message from the Apache Git Service. To respond to th

Re: [I] Grouping statement enum by macro-categories [datafusion-sqlparser-rs]

2026-02-15 Thread via GitHub
xitep commented on issue #2218: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2218#issuecomment-3904733068 i believe your suggestion could lead to reducing the size of the `Statement` (it's about 2kb right now :/) -- This is an automated message from the Apache Git Servic

[PR] Add support for parquet field [datafusion]

2026-02-15 Thread via GitHub
SubhamSinghal opened a new pull request, #20370: URL: https://github.com/apache/datafusion/pull/20370 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion-comet/issues/3434 ## Rationale for this change Adding support for reading parquet field_i

Re: [PR] MSSQL: Add support for TRAN shorthand [datafusion-sqlparser-rs]

2026-02-15 Thread via GitHub
guan404ming commented on PR #2212: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2212#issuecomment-3904671284 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[PR] feat: add LazyPartition trait for LazyMemoryExec, migrate generate_series to partitions [datafusion]

2026-02-15 Thread via GitHub
ethan-tyler opened a new pull request, #20369: URL: https://github.com/apache/datafusion/pull/20369 ## Which issue does this PR close? - Closes #13614 ## Rationale for this change `LazyMemoryExec` currently uses generator closures (`LazyBatchGenerator`) as its partition

Re: [PR] fix: IS NULL doesn't type-check its input and panic [datafusion]

2026-02-15 Thread via GitHub
neilconway commented on PR #20306: URL: https://github.com/apache/datafusion/pull/20306#issuecomment-3904739530 Proposal: 1. Remove the early return from `coerce_arguments_for_signature()`, because that seems wrong in any case. (I'm happy to send a PR for this or you can, @Acfboy --

Re: [I] Floor built in function return type [datafusion]

2026-02-15 Thread via GitHub
theirix commented on issue #8795: URL: https://github.com/apache/datafusion/issues/8795#issuecomment-3904855285 The floor and ceil udfs now support decimal return type via #18979. Non-trivial functions like pow, log, etc. are also supported (linked in the parent EPIC). So I thi

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3904954170 show benchmark queue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
alamb-ghbot commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3904954291 🤖 Hi @adriangb, you asked to view the benchmark queue (https://github.com/apache/datafusion/pull/20363#issuecomment-3904954170). | Job | User | Benchmarks | Comment | |

Re: [PR] feat: adaptive filter selectivity tracking for Parquet row filters [datafusion]

2026-02-15 Thread via GitHub
Dandandan commented on code in PR #19639: URL: https://github.com/apache/datafusion/pull/19639#discussion_r2809847688 ## datafusion/datasource-parquet/src/row_filter.rs: ## @@ -654,6 +711,211 @@ pub fn build_row_filter( .map(|filters| Some(RowFilter::new(filters))) }

Re: [PR] feat: Add selectivity-tracking wrapper for dynamic filters [datafusion]

2026-02-15 Thread via GitHub
Dandandan commented on PR #20160: URL: https://github.com/apache/datafusion/pull/20160#issuecomment-3905053370 > [#20160 (comment)](https://github.com/apache/datafusion/pull/20160#issuecomment-3902329306) > > This is the main improvement. Ok - yes I see some improvements here a

[PR] chore(deps): bump tonic-prost from 0.14.3 to 0.14.4 [datafusion-ballista]

2026-02-15 Thread via GitHub
dependabot[bot] opened a new pull request, #1454: URL: https://github.com/apache/datafusion-ballista/pull/1454 Bumps [tonic-prost](https://github.com/hyperium/tonic) from 0.14.3 to 0.14.4. Release notes Sourced from https://github.com/hyperium/tonic/releases";>tonic-prost's release

[PR] chore(deps): bump libc from 0.2.181 to 0.2.182 [datafusion-ballista]

2026-02-15 Thread via GitHub
dependabot[bot] opened a new pull request, #1456: URL: https://github.com/apache/datafusion-ballista/pull/1456 Bumps [libc](https://github.com/rust-lang/libc) from 0.2.181 to 0.2.182. Release notes Sourced from https://github.com/rust-lang/libc/releases";>libc's releases. 0.2

[PR] chore(deps): bump tonic-build from 0.14.3 to 0.14.4 [datafusion-ballista]

2026-02-15 Thread via GitHub
dependabot[bot] opened a new pull request, #1457: URL: https://github.com/apache/datafusion-ballista/pull/1457 Bumps [tonic-build](https://github.com/hyperium/tonic) from 0.14.3 to 0.14.4. Release notes Sourced from https://github.com/hyperium/tonic/releases";>tonic-build's release

[PR] chore(deps): bump tonic from 0.14.3 to 0.14.4 [datafusion-ballista]

2026-02-15 Thread via GitHub
dependabot[bot] opened a new pull request, #1455: URL: https://github.com/apache/datafusion-ballista/pull/1455 Bumps [tonic](https://github.com/hyperium/tonic) from 0.14.3 to 0.14.4. Release notes Sourced from https://github.com/hyperium/tonic/releases";>tonic's releases. v0.

[PR] chore(deps): bump uuid from 1.20.0 to 1.21.0 [datafusion-ballista]

2026-02-15 Thread via GitHub
dependabot[bot] opened a new pull request, #1458: URL: https://github.com/apache/datafusion-ballista/pull/1458 Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.20.0 to 1.21.0. Release notes Sourced from https://github.com/uuid-rs/uuid/releases";>uuid's releases. v1.21.0

[PR] chore(deps): bump tonic-prost-build from 0.14.3 to 0.14.4 [datafusion-ballista]

2026-02-15 Thread via GitHub
dependabot[bot] opened a new pull request, #1459: URL: https://github.com/apache/datafusion-ballista/pull/1459 Bumps [tonic-prost-build](https://github.com/hyperium/tonic) from 0.14.3 to 0.14.4. Release notes Sourced from https://github.com/hyperium/tonic/releases";>tonic-prost-bui

Re: [PR] fix: IS NULL doesn't type-check its input and panic [datafusion]

2026-02-15 Thread via GitHub
Acfboy commented on PR #20306: URL: https://github.com/apache/datafusion/pull/20306#issuecomment-3906408699 Thanks @neilconway ! You are right. I have changed the pr and re-request reivews. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3905097162 > See [#20160 (comment)](https://github.com/apache/datafusion/pull/20160#issuecomment-3905053370) > > I think approaches to adaptiveness/selectivity tracking also need to

Re: [I] Provide more generic API for the capacity limits [datafusion]

2026-02-15 Thread via GitHub
erenavsarogullari commented on issue #20371: URL: https://github.com/apache/datafusion/issues/20371#issuecomment-3905109121 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[I] Provide more generic API for the capacity limits [datafusion]

2026-02-15 Thread via GitHub
erenavsarogullari opened a new issue, #20371: URL: https://github.com/apache/datafusion/issues/20371 ### Is your feature request related to a problem or challenge? Currently, `datafusion.runtime.max_temp_directory_size` is a disk based config but when it is set as `invalid limit` or `

[PR] feat: add coerce_arguments flag to UDTFs to allow skipping automatic … [datafusion]

2026-02-15 Thread via GitHub
evangelisilva opened a new pull request, #20376: URL: https://github.com/apache/datafusion/pull/20376 # UDTF Argument Coercion Suppression ## Which issue does this PR close? Closes #20293. ## Rationale for this change Currently, User-Defined Table Functions (UDTFs)

Re: [I] [Feature] Support Spark expression: map_filter [datafusion-comet]

2026-02-15 Thread via GitHub
CuteChuanChuan commented on issue #3165: URL: https://github.com/apache/datafusion-comet/issues/3165#issuecomment-3906750613 I have a question about implementing map_filter. Spark's `map_filter` requires lambda syntax: ```sql SELECT map_filter(map(1, 0, 2, 2), (k, v) -> k > v)

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
alamb commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3905143729 run benchmark tpch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
alamb-ghbot commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3905144070 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
alamb commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3905144081 I restarted the job runner. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
alamb commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3905145488 It might be time to invest in a more legit benchmark runner strategy -- my bash script nest is not super reliable. -- This is an automated message from the Apache Git Service. T

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
alamb-ghbot commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3905798773 🤖 Hi @adriangb, thanks for the request (https://github.com/apache/datafusion/pull/20363#issuecomment-3905798461). [`scrape_comments.py`](https://github.com/alamb/datafusio

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3905800999 run benchmarks DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true -- This is an automated message from the Apache Git Servic

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
alamb-ghbot commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3905800566 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3905798461 run benchmark tpds DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true -- This is an automated message from the Apache Git Se

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-15 Thread via GitHub
adriangb commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3905800281 run benchmark tpcds DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true -- This is an automated message from the Apache Git S

[I] Set expected runtime config in error message when the used disk space during the spilling process has exceeded the allocation limit [datafusion]

2026-02-15 Thread via GitHub
erenavsarogullari opened a new issue, #20373: URL: https://github.com/apache/datafusion/issues/20373 ### Is your feature request related to a problem or challenge? Minor refactoring on error message by exposing required config name for the end user. This is follow-up PR to both PR: #2

[PR] perf: Optimize array_has() for scalar needle [datafusion]

2026-02-15 Thread via GitHub
neilconway opened a new pull request, #20374: URL: https://github.com/apache/datafusion/pull/20374 ## Which issue does this PR close? - Partially addresses #18181. ## Rationale for this change Previous observations in #18181 suggested that `array_has` is relativel

Re: [I] Set expected runtime config in error message when the used disk space during the spilling process has exceeded the allocation limit [datafusion]

2026-02-15 Thread via GitHub
erenavsarogullari commented on issue #20373: URL: https://github.com/apache/datafusion/issues/20373#issuecomment-3905193811 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] ClickBench Q10 slows down when filter pushdown is enabled [datafusion]

2026-02-15 Thread via GitHub
Dandandan commented on issue #20325: URL: https://github.com/apache/datafusion/issues/20325#issuecomment-3905282899 In addition to some of the overhead you already mentioned (CachedArrayReader / skips / filter + concat) that could be reduced, I think a lot is actually the IO pattern.

Re: [PR] perf: Optimize array_has() for scalar needle [datafusion]

2026-02-15 Thread via GitHub
neilconway commented on PR #20374: URL: https://github.com/apache/datafusion/pull/20374#issuecomment-3905768726 Benchmarks: ``` group vanilla opt -