Re: [PR] feat: add `Dataframe.cache()` factory (no planner handling) [datafusion-ballista]

2026-02-18 Thread via GitHub
killzoner commented on PR #1420: URL: https://github.com/apache/datafusion-ballista/pull/1420#issuecomment-3919387995 > thanks @killzoner just minor comments about tests, otherwise it looks good Thank you ! I guess there is only https://github.com/apache/datafusion-ballista/pull/1420

Re: [PR] feat: Make push scheduling policy default as it has ~2.5x lower latency [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on PR #1461: URL: https://github.com/apache/datafusion-ballista/pull/1461#issuecomment-3919389759 @mattcuento @killzoner @danielhumanmod any chance for a review? thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] perf: Skip RowFilter when all predicate columns are in the projection [datafusion]

2026-02-18 Thread via GitHub
Dandandan commented on PR #20417: URL: https://github.com/apache/datafusion/pull/20417#issuecomment-3919405696 show benchmark queue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] feat: add `Dataframe.cache()` factory (no planner handling) [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on PR #1420: URL: https://github.com/apache/datafusion-ballista/pull/1420#issuecomment-3919405378 > > thanks @killzoner just minor comments about tests, otherwise it looks good > > Thank you ! I guess there is only [#1420 (comment)](https://github.com/apache/da

Re: [PR] perf: Skip RowFilter when all predicate columns are in the projection [datafusion]

2026-02-18 Thread via GitHub
alamb-ghbot commented on PR #20417: URL: https://github.com/apache/datafusion/pull/20417#issuecomment-3919405819 🤖 Hi @Dandandan, you asked to view the benchmark queue (https://github.com/apache/datafusion/pull/20417#issuecomment-3919405696). | Job | User | Benchmarks | Comment | |

Re: [PR] feat: Make push scheduling policy default as it has ~2.5x lower latency [datafusion-ballista]

2026-02-18 Thread via GitHub
killzoner commented on PR #1461: URL: https://github.com/apache/datafusion-ballista/pull/1461#issuecomment-3919468322 Quick search show default still on this file though : https://github.com/killzoner/datafusion-ballista/blob/issue-1395/ballista/scheduler/scheduler_config_spec.toml#L63 --

Re: [PR] feat: add `Dataframe.cache()` factory (no planner handling) [datafusion-ballista]

2026-02-18 Thread via GitHub
killzoner commented on code in PR #1420: URL: https://github.com/apache/datafusion-ballista/pull/1420#discussion_r2820961020 ## ballista/client/tests/context_unsupported.rs: ## @@ -143,4 +149,52 @@ mod unsupported { Ok(()) } + +#[rstest] +#[case::standal

Re: [PR] feat: add `Dataframe.cache()` factory (no planner handling) [datafusion-ballista]

2026-02-18 Thread via GitHub
killzoner commented on code in PR #1420: URL: https://github.com/apache/datafusion-ballista/pull/1420#discussion_r2820965372 ## ballista/client/tests/context_unsupported.rs: ## @@ -143,4 +149,52 @@ mod unsupported { Ok(()) } + +#[rstest] +#[case::standal

[PR] feat: Make push scheduling policy default as it has ~2.5x lower latency [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm opened a new pull request, #1461: URL: https://github.com/apache/datafusion-ballista/pull/1461 # Which issue does this PR close? Closes #. # Rationale for this change Pull based scheduling strategy has been use as default since version 43, without any va

Re: [PR] feat: add `Dataframe.cache()` factory (no planner handling) [datafusion-ballista]

2026-02-18 Thread via GitHub
killzoner commented on code in PR #1420: URL: https://github.com/apache/datafusion-ballista/pull/1420#discussion_r2821158685 ## ballista/core/src/serde/mod.rs: ## @@ -186,17 +191,63 @@ impl LogicalExtensionCodec for BallistaLogicalExtensionCodec { &self, buf:

Re: [PR] feat: add `Dataframe.cache()` factory (no planner handling) [datafusion-ballista]

2026-02-18 Thread via GitHub
killzoner commented on code in PR #1420: URL: https://github.com/apache/datafusion-ballista/pull/1420#discussion_r2821161537 ## ballista/core/src/serde/mod.rs: ## @@ -186,17 +191,63 @@ impl LogicalExtensionCodec for BallistaLogicalExtensionCodec { &self, buf:

Re: [I] Memory use trait / methods / macros [datafusion]

2026-02-18 Thread via GitHub
rluvaton commented on issue #19615: URL: https://github.com/apache/datafusion/issues/19615#issuecomment-3919551819 Is it similar to this? https://github.com/apache/datafusion/issues/16904#issuecomment-3193805380 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] feat: Support int to timestamp casts [datafusion-comet]

2026-02-18 Thread via GitHub
coderfender commented on code in PR #3541: URL: https://github.com/apache/datafusion-comet/pull/3541#discussion_r2823407820 ## spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala: ## @@ -332,6 +332,38 @@ abstract class CometTestBase } } +// inspired from spa

Re: [PR] feat: Adaptive query execution (AQE) planner fundamentals [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on PR #1372: URL: https://github.com/apache/datafusion-ballista/pull/1372#issuecomment-3921982197 > Woohoo! Nice work @milenkovicm. At a high level this all makes sense, acknowledging the TODOs. > > Some notes of my own for mile-high level understanding: >

Re: [PR] feat: Adaptive query execution (AQE) planner fundamentals [datafusion-ballista]

2026-02-18 Thread via GitHub
danielhumanmod commented on code in PR #1372: URL: https://github.com/apache/datafusion-ballista/pull/1372#discussion_r2823453244 ## ballista/scheduler/src/state/aqe/optimizer_rule/datafusion_patch.rs: ## @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) un

Re: [PR] deps: DataFusion 52.0.0 migration (SchemaAdapter changes, etc.) [datafusion-comet]

2026-02-18 Thread via GitHub
mbutrovich commented on code in PR #3536: URL: https://github.com/apache/datafusion-comet/pull/3536#discussion_r2823463388 ## dev/diffs/3.5.8.diff: ## @@ -2795,7 +2795,7 @@ index d675503a8ba..f220892396e 100644 + } Review Comment: I think this file just has a whitespac

Re: [PR] feat: Adaptive query execution (AQE) planner fundamentals [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on code in PR #1372: URL: https://github.com/apache/datafusion-ballista/pull/1372#discussion_r2823470023 ## ballista/scheduler/src/state/aqe/optimizer_rule/datafusion_patch.rs: ## @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) under

Re: [PR] perf: Optimize translate() UDF for scalar inputs [datafusion]

2026-02-18 Thread via GitHub
neilconway commented on code in PR #20305: URL: https://github.com/apache/datafusion/pull/20305#discussion_r2823034152 ## datafusion/functions/src/unicode/translate.rs: ## @@ -46,10 +46,10 @@ use datafusion_macros::user_doc; +--+

Re: [PR] fix: handle Utf8View and LargeUtf8 separators in concat_ws [datafusion]

2026-02-18 Thread via GitHub
neilconway commented on code in PR #20361: URL: https://github.com/apache/datafusion/pull/20361#discussion_r2823138113 ## datafusion/functions/src/string/concat_ws.rs: ## @@ -162,23 +156,55 @@ impl ScalarUDFImpl for ConcatWsFunc { // parse sep let sep = match

Re: [PR] feat: Adaptive query execution (AQE) planner fundamentals [datafusion-ballista]

2026-02-18 Thread via GitHub
danielhumanmod commented on code in PR #1372: URL: https://github.com/apache/datafusion-ballista/pull/1372#discussion_r2823541555 ## ballista/scheduler/src/state/aqe/optimizer_rule/distributed_exchange.rs: ## @@ -0,0 +1,140 @@ +// Licensed to the Apache Software Foundation (ASF

Re: [PR] feat: Adaptive query execution (AQE) planner fundamentals [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on code in PR #1372: URL: https://github.com/apache/datafusion-ballista/pull/1372#discussion_r2823556576 ## ballista/scheduler/src/state/aqe/optimizer_rule/distributed_exchange.rs: ## @@ -0,0 +1,140 @@ +// Licensed to the Apache Software Foundation (ASF) u

Re: [PR] deps: DataFusion 52.0.0 migration (SchemaAdapter changes, etc.) [datafusion-comet]

2026-02-18 Thread via GitHub
comphead commented on code in PR #3536: URL: https://github.com/apache/datafusion-comet/pull/3536#discussion_r2823286441 ## native/core/src/execution/operators/iceberg_scan.rs: ## @@ -221,15 +218,12 @@ impl IcebergScanMetrics { /// Wrapper around iceberg-rust's stream that pe

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-18 Thread via GitHub
vinaygjain commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3921863658 I am on branch-52 and just benchmarked Q40 and see a speed-up of 35% (20 ms without VS 13 ms with) **SET datafusion.execution.parquet.pushdown_filters = false;**

Re: [PR] perf: exclude aggregate dynamic filters from parquet row-level evaluation [datafusion]

2026-02-18 Thread via GitHub
notashes closed pull request #20332: perf: exclude aggregate dynamic filters from parquet row-level evaluation URL: https://github.com/apache/datafusion/pull/20332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [datafusion-spark] Implement map function [datafusion]

2026-02-18 Thread via GitHub
Shekharrajak commented on code in PR #20358: URL: https://github.com/apache/datafusion/pull/20358#discussion_r2823188990 ## datafusion/sqllogictest/test_files/spark/collection/size.slt: ## @@ -60,12 +60,12 @@ SELECT size(arrow_cast(make_array(1, 2, 3, 4), 'FixedSizeList(4, Int3

Re: [PR] feat: add regexp_instr function [datafusion-python]

2026-02-18 Thread via GitHub
timsaucer merged PR #1382: URL: https://github.com/apache/datafusion-python/pull/1382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] feat: add regexp_instr function [datafusion-python]

2026-02-18 Thread via GitHub
timsaucer commented on PR #1382: URL: https://github.com/apache/datafusion-python/pull/1382#issuecomment-3921870439 Nice work! Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [Minor] Fix error messages for `shrink` and `try_shrink` [datafusion]

2026-02-18 Thread via GitHub
hareshkh commented on code in PR #20422: URL: https://github.com/apache/datafusion/pull/20422#discussion_r2823323861 ## datafusion/execution/src/memory_pool/mod.rs: ## @@ -387,7 +387,9 @@ impl MemoryReservation { atomic::Ordering::Relaxed, |prev

Re: [PR] Fix Python UDAF list-of-timestamps return by enforcing list-valued scalars and caching PyArrow types [datafusion-python]

2026-02-18 Thread via GitHub
timsaucer commented on PR #1347: URL: https://github.com/apache/datafusion-python/pull/1347#issuecomment-3921628046 Thank you for all the work on this @kosiew ! I think the final solution is very nice. We couldn't have gotten here so quickly without your work. -- This is an automated mes

Re: [I] Incorrect cast of integer columns to utf8 when comparing with utf8 constant [datafusion]

2026-02-18 Thread via GitHub
neilconway commented on issue #15161: URL: https://github.com/apache/datafusion/issues/15161#issuecomment-3922429717 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: Support int to timestamp casts [datafusion-comet]

2026-02-18 Thread via GitHub
coderfender commented on PR #3541: URL: https://github.com/apache/datafusion-comet/pull/3541#issuecomment-3922495630 Thank you for merging main and the suggestions re testing long to timestamp @andygrove , @mbutrovich -- This is an automated message from the Apache Git Service. To resp

Re: [PR] feat: Make push scheduling policy default as it has ~2.5x lower latency [datafusion-ballista]

2026-02-18 Thread via GitHub
killzoner commented on PR #1461: URL: https://github.com/apache/datafusion-ballista/pull/1461#issuecomment-3920366464 The push policy is not supported in standalone executor, I am looking at backporting this -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] feat: Make push scheduling policy default as it has ~2.5x lower latency [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on PR #1461: URL: https://github.com/apache/datafusion-ballista/pull/1461#issuecomment-3920411176 good catch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Fix name tracker [datafusion]

2026-02-18 Thread via GitHub
LiaCastaneda commented on code in PR #19856: URL: https://github.com/apache/datafusion/pull/19856#discussion_r2821386586 ## datafusion/substrait/src/logical_plan/consumer/utils.rs: ## @@ -396,10 +433,25 @@ impl NameTracker { &mut self, expr: Expr, ) -> dat

Re: [PR] feat: Make push scheduling policy default as it has ~2.5x lower latency [datafusion-ballista]

2026-02-18 Thread via GitHub
killzoner commented on PR #1461: URL: https://github.com/apache/datafusion-ballista/pull/1461#issuecomment-3919942944 > https://deepwiki.com/apache/datafusion-ballista does not look bad, some diagrams make a lot of sense > > https://deepwiki.com/search/pull-based-vs-push-based_d01ad

Re: [PR] Basic Extension Type Registry Implementation [datafusion]

2026-02-18 Thread via GitHub
tobixdev commented on code in PR #20312: URL: https://github.com/apache/datafusion/pull/20312#discussion_r2803054679 ## datafusion/common/src/types/extension.rs: ## @@ -0,0 +1,71 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] Disallow order by within ordered-set aggregate functions argument lis… [datafusion]

2026-02-18 Thread via GitHub
cj-zhukov commented on PR #20421: URL: https://github.com/apache/datafusion/pull/20421#issuecomment-3920591723 ### High-level overview This PR fixes incorrect handling of ordered-set aggregate syntax. Specifically: - Updated `datafusion/sql/src/expr/function.rs` to return a plann

Re: [PR] perf: Optimize concat()/concat_ws() UDFs [datafusion]

2026-02-18 Thread via GitHub
martin-g commented on code in PR #20317: URL: https://github.com/apache/datafusion/pull/20317#discussion_r2822792245 ## datafusion/functions/src/string/concat.rs: ## @@ -207,7 +207,7 @@ impl ScalarUDFImpl for ConcatFunc { DataType::Utf8View => {

Re: [PR] docs: Update Parquet scan documentation [datafusion-comet]

2026-02-18 Thread via GitHub
mbutrovich merged PR #3433: URL: https://github.com/apache/datafusion-comet/pull/3433 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] perf: Optimize translate() UDF for scalar inputs [datafusion]

2026-02-18 Thread via GitHub
Jefffrey commented on code in PR #20305: URL: https://github.com/apache/datafusion/pull/20305#discussion_r2822774611 ## datafusion/functions/src/unicode/translate.rs: ## @@ -99,6 +100,65 @@ impl ScalarUDFImpl for TranslateFunc { &self, args: datafusion_expr::Sc

Re: [I] feat: Add automation to manage stale issues and PRs [datafusion-site]

2026-02-18 Thread via GitHub
Abhinandankaushik commented on issue #150: URL: https://github.com/apache/datafusion-site/issues/150#issuecomment-3921352808 hey @Jefffrey shall i have to close this issue ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] perf: defer expensive string predicates from `RowFilter` when dynamic filter is present [datafusion]

2026-02-18 Thread via GitHub
notashes commented on code in PR #20413: URL: https://github.com/apache/datafusion/pull/20413#discussion_r2822834340 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -459,27 +465,55 @@ impl FileOpener for ParquetOpener { // `row_filter` for details.

Re: [PR] perf: defer expensive string predicates from `RowFilter` when dynamic filter is present [datafusion]

2026-02-18 Thread via GitHub
notashes closed pull request #20413: perf: defer expensive string predicates from `RowFilter` when dynamic filter is present URL: https://github.com/apache/datafusion/pull/20413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] perf: defer expensive string predicates from `RowFilter` when dynamic filter is present [datafusion]

2026-02-18 Thread via GitHub
notashes commented on PR #20413: URL: https://github.com/apache/datafusion/pull/20413#issuecomment-3921380779 closed in favour of #20417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] feat: Add automation to manage stale issues and PRs [datafusion-site]

2026-02-18 Thread via GitHub
Jefffrey commented on issue #150: URL: https://github.com/apache/datafusion-site/issues/150#issuecomment-3921393608 Thanks for the suggestion, but I don't think theres too much value to be had here -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [I] feat: Add automation to manage stale issues and PRs [datafusion-site]

2026-02-18 Thread via GitHub
Jefffrey closed issue #150: feat: Add automation to manage stale issues and PRs URL: https://github.com/apache/datafusion-site/issues/150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: Adaptive query execution (AQE) planner fundamentals [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on PR #1372: URL: https://github.com/apache/datafusion-ballista/pull/1372#issuecomment-3922342451 > LGTM, thanks for building the skeleton and leaving the TODOs for future contributions. > > For better understanding, I've summarized my understanding of the AQE

Re: [PR] feat: Adaptive query execution (AQE) planner fundamentals [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on PR #1372: URL: https://github.com/apache/datafusion-ballista/pull/1372#issuecomment-3922375193 I'd recommend reading if you're interested https://www.cs.cmu.edu/~15721-f24/papers/AQP_in_Lakehouse.pdf -- This is an automated message from the Apache Git Service.

Re: [PR] chore: Cleanup returning null arrays [datafusion]

2026-02-18 Thread via GitHub
neilconway commented on PR #20423: URL: https://github.com/apache/datafusion/pull/20423#issuecomment-3922245690 cc @Jefffrey -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] chore: Cleanup returning null arrays [datafusion]

2026-02-18 Thread via GitHub
neilconway opened a new pull request, #20423: URL: https://github.com/apache/datafusion/pull/20423 Cleanup a few places where the code returned a null array but it would be a bit cleaner and faster to return a typed scalar null instead. ## Which issue does this PR close? Does n

Re: [PR] perf: Skip RowFilter when all predicate columns are in the projection [datafusion]

2026-02-18 Thread via GitHub
alamb-ghbot commented on PR #20417: URL: https://github.com/apache/datafusion/pull/20417#issuecomment-3922312196 🤖 Hi @darmie, thanks for the request (https://github.com/apache/datafusion/pull/20417#issuecomment-3922311434). [`scrape_comments.py`](https://github.com/alamb/datafusion-benchma

Re: [PR] perf: Skip RowFilter when all predicate columns are in the projection [datafusion]

2026-02-18 Thread via GitHub
darmie commented on PR #20417: URL: https://github.com/apache/datafusion/pull/20417#issuecomment-3922311434 run benchmark clickbench_partitioned DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true -- This is an automated message from t

[PR] minor: update cargo dependencies [datafusion-python]

2026-02-18 Thread via GitHub
timsaucer opened a new pull request, #1383: URL: https://github.com/apache/datafusion-python/pull/1383 This is simply a `cargo update` command. I did a drive by removal of steps that should no longer be required for CI during testing now that all of the building happens in a different

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-18 Thread via GitHub
gene-bordegaray commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2823786180 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -450,6 +531,25 @@ impl PhysicalExpr for DynamicFilterPhysicalExpr { } } +///

Re: [PR] minor: update cargo dependencies [datafusion-python]

2026-02-18 Thread via GitHub
timsaucer merged PR #1383: URL: https://github.com/apache/datafusion-python/pull/1383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] build(deps): bump uuid from 1.20.0 to 1.21.0 [datafusion-python]

2026-02-18 Thread via GitHub
dependabot[bot] commented on PR #1380: URL: https://github.com/apache/datafusion-python/pull/1380#issuecomment-3922432287 Looks like uuid is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] build(deps): bump arrow-select from 57.2.0 to 57.3.0 [datafusion-python]

2026-02-18 Thread via GitHub
dependabot[bot] commented on PR #1373: URL: https://github.com/apache/datafusion-python/pull/1373#issuecomment-3922432292 Looks like arrow-select is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] build(deps): bump uuid from 1.20.0 to 1.21.0 [datafusion-python]

2026-02-18 Thread via GitHub
dependabot[bot] closed pull request #1380: build(deps): bump uuid from 1.20.0 to 1.21.0 URL: https://github.com/apache/datafusion-python/pull/1380 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Incorrect cast of integer columns to utf8 when comparing with utf8 constant [datafusion]

2026-02-18 Thread via GitHub
neilconway commented on issue #15161: URL: https://github.com/apache/datafusion/issues/15161#issuecomment-3922429375 @AlonSpivack Thanks for flagging this! I'm sorry that this resulted in incorrect query results in production for you. At a high-level, I agree with your analysis, and a

Re: [PR] build(deps): bump arrow-select from 57.2.0 to 57.3.0 [datafusion-python]

2026-02-18 Thread via GitHub
dependabot[bot] closed pull request #1373: build(deps): bump arrow-select from 57.2.0 to 57.3.0 URL: https://github.com/apache/datafusion-python/pull/1373 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] build(deps): bump arrow from 57.2.0 to 57.3.0 [datafusion-python]

2026-02-18 Thread via GitHub
dependabot[bot] commented on PR #1374: URL: https://github.com/apache/datafusion-python/pull/1374#issuecomment-3922432577 Looks like arrow is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] build(deps): bump arrow from 57.2.0 to 57.3.0 [datafusion-python]

2026-02-18 Thread via GitHub
dependabot[bot] closed pull request #1374: build(deps): bump arrow from 57.2.0 to 57.3.0 URL: https://github.com/apache/datafusion-python/pull/1374 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [Oracle] Table alias for INSERTed table [datafusion-sqlparser-rs]

2026-02-18 Thread via GitHub
xitep commented on code in PR #2214: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2214#discussion_r2821592152 ## src/parser/mod.rs: ## @@ -4543,7 +4543,13 @@ impl<'a> Parser<'a> { /// /// Returns true if the current token matches the expected keyword.

Re: [PR] feat: Adaptive query execution (AQE) planner fundamentals [datafusion-ballista]

2026-02-18 Thread via GitHub
killzoner commented on code in PR #1372: URL: https://github.com/apache/datafusion-ballista/pull/1372#discussion_r2824080937 ## ballista/scheduler/src/state/aqe/optimizer_rule/datafusion_patch.rs: ## @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] feat: add `Dataframe.cache()` factory (no planner handling) [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on code in PR #1420: URL: https://github.com/apache/datafusion-ballista/pull/1420#discussion_r2821205020 ## ballista/core/src/serde/mod.rs: ## @@ -186,17 +191,63 @@ impl LogicalExtensionCodec for BallistaLogicalExtensionCodec { &self, bu

Re: [PR] perf: Skip RowFilter when all predicate columns are in the projection [datafusion]

2026-02-18 Thread via GitHub
Dandandan commented on PR #20417: URL: https://github.com/apache/datafusion/pull/20417#issuecomment-3921232137 @alamb looks like the runner isn't working -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] perf: defer expensive string predicates from `RowFilter` when dynamic filter is present [datafusion]

2026-02-18 Thread via GitHub
Dandandan commented on code in PR #20413: URL: https://github.com/apache/datafusion/pull/20413#discussion_r2822745501 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -459,27 +465,55 @@ impl FileOpener for ParquetOpener { // `row_filter` for details.

Re: [PR] feat: Make push scheduling policy default as it has ~2.5x lower latency [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on PR #1461: URL: https://github.com/apache/datafusion-ballista/pull/1461#issuecomment-3921569620 > Results look great! Good call on the standalone. > > Separately, curious if either of you have played with tuning the loop interval for pull-based? I see this con

Re: [PR] Fix infinite recursion in Session impl for SessionState [datafusion]

2026-02-18 Thread via GitHub
vkverma9534 commented on PR #20148: URL: https://github.com/apache/datafusion/pull/20148#issuecomment-3921644926 @Jefffrey Thanks for the consideration you’re right to be skeptical here this was initially llm assisted, and on a closer look my original suspicion doesn’t hold up the way

Re: [PR] fix: multi-insert with native writer in Spark 4.x (#3430) [datafusion-comet]

2026-02-18 Thread via GitHub
Shekharrajak commented on PR #3530: URL: https://github.com/apache/datafusion-comet/pull/3530#issuecomment-3921649071 @coderfender, Please have a look. I think this code changes will resolve multiple issues of writer. -- This is an automated message from the Apache Git Service. To resp

Re: [I] Cannot do udaf that returns list of timestamps [datafusion-python]

2026-02-18 Thread via GitHub
timsaucer closed issue #1339: Cannot do udaf that returns list of timestamps URL: https://github.com/apache/datafusion-python/issues/1339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Fix Python UDAF list-of-timestamps return by enforcing list-valued scalars and caching PyArrow types [datafusion-python]

2026-02-18 Thread via GitHub
timsaucer merged PR #1347: URL: https://github.com/apache/datafusion-python/pull/1347 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Count() and Count(Distinct )should accept multiple exprs [datafusion]

2026-02-18 Thread via GitHub
Mark1626 commented on issue #5619: URL: https://github.com/apache/datafusion/issues/5619#issuecomment-3921668952 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Count() and Count(Distinct )should accept multiple exprs [datafusion]

2026-02-18 Thread via GitHub
Mark1626 commented on issue #5619: URL: https://github.com/apache/datafusion/issues/5619#issuecomment-3921674034 The old PR for count distinct is stale and outdated (the code of COUNT seems to have been refactored a lot). I'll be re-implementing it -- This is an automated message from the

Re: [I] Remove `From` for `Column` [datafusion]

2026-02-18 Thread via GitHub
ishanema03 commented on issue #17375: URL: https://github.com/apache/datafusion/issues/17375#issuecomment-3921673344 Hi @findepi I have raised a PR for this issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] fix: handle Utf8View and LargeUtf8 separators in concat_ws [datafusion]

2026-02-18 Thread via GitHub
Jefffrey commented on code in PR #20361: URL: https://github.com/apache/datafusion/pull/20361#discussion_r2822331972 ## datafusion/functions/src/string/concat_ws.rs: ## @@ -162,23 +156,55 @@ impl ScalarUDFImpl for ConcatWsFunc { // parse sep let sep = match &

Re: [I] bug: Disable `DataFrame.cache()` for ballista [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm closed issue #1395: bug: Disable `DataFrame.cache()` for ballista URL: https://github.com/apache/datafusion-ballista/issues/1395 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] feat: add `Dataframe.cache()` factory (no planner handling) [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm merged PR #1420: URL: https://github.com/apache/datafusion-ballista/pull/1420 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] fix: remove `scheduler_config_spec.toml` as it is unused [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm merged PR #1462: URL: https://github.com/apache/datafusion-ballista/pull/1462 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] feat: add `Dataframe.cache()` factory (no planner handling) [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on PR #1420: URL: https://github.com/apache/datafusion-ballista/pull/1420#issuecomment-3920846943 merging this, thanks @killzoner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] perf: Optimize `array_has_any()` with scalar arg [datafusion]

2026-02-18 Thread via GitHub
neilconway commented on PR #20385: URL: https://github.com/apache/datafusion/pull/20385#issuecomment-3921060646 Note that the commit fixing up the benchmarks is shared with #20374 -- I can also pull that out into a separate PR, because it's a prerequisite for any performance work on these f

Re: [PR] perf: Optimize scalar fast path for `regexp_like` [datafusion]

2026-02-18 Thread via GitHub
Jefffrey commented on code in PR #20354: URL: https://github.com/apache/datafusion/pull/20354#discussion_r2822558042 ## datafusion/functions/src/regex/regexplike.rs: ## @@ -131,28 +132,39 @@ impl ScalarUDFImpl for RegexpLikeFunc { ) -> Result { let args = &args.arg

Re: [PR] perf: Optimize concat()/concat_ws() UDFs [datafusion]

2026-02-18 Thread via GitHub
neilconway commented on code in PR #20317: URL: https://github.com/apache/datafusion/pull/20317#discussion_r2822606960 ## datafusion/functions/src/string/concat.rs: ## @@ -207,7 +207,7 @@ impl ScalarUDFImpl for ConcatFunc { DataType::Utf8View => {

Re: [PR] [Minor] Fix error messages for `shrink` and `try_shrink` [datafusion]

2026-02-18 Thread via GitHub
getChan commented on code in PR #20422: URL: https://github.com/apache/datafusion/pull/20422#discussion_r2822568251 ## datafusion/execution/src/memory_pool/mod.rs: ## @@ -387,7 +387,9 @@ impl MemoryReservation { atomic::Ordering::Relaxed, |prev|

Re: [PR] feat: Adaptive query execution (AQE) planner fundamentals [datafusion-ballista]

2026-02-18 Thread via GitHub
killzoner commented on code in PR #1372: URL: https://github.com/apache/datafusion-ballista/pull/1372#discussion_r2824106510 ## ballista/scheduler/src/state/aqe/optimizer_rule/distributed_exchange.rs: ## @@ -0,0 +1,140 @@ +// Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Port regex_extract [datafusion]

2026-02-18 Thread via GitHub
rluvaton commented on code in PR #20308: URL: https://github.com/apache/datafusion/pull/20308#discussion_r2824158713 ## datafusion/functions/src/regex/mod.rs: ## @@ -65,6 +67,19 @@ pub mod expr_fn { super::regexp_match().call(args) } +/// Extracts a group tha

Re: [PR] Port regex_extract [datafusion]

2026-02-18 Thread via GitHub
rluvaton commented on code in PR #20308: URL: https://github.com/apache/datafusion/pull/20308#discussion_r2824164646 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,551 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Port regex_extract [datafusion]

2026-02-18 Thread via GitHub
rluvaton commented on code in PR #20308: URL: https://github.com/apache/datafusion/pull/20308#discussion_r2824158713 ## datafusion/functions/src/regex/mod.rs: ## @@ -65,6 +67,19 @@ pub mod expr_fn { super::regexp_match().call(args) } +/// Extracts a group tha

Re: [PR] feat: Adaptive query execution (AQE) planner fundamentals [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on code in PR #1372: URL: https://github.com/apache/datafusion-ballista/pull/1372#discussion_r2824191028 ## ballista/scheduler/src/state/aqe/optimizer_rule/distributed_exchange.rs: ## @@ -0,0 +1,140 @@ +// Licensed to the Apache Software Foundation (ASF) u

Re: [PR] feat: Make push scheduling policy default as it has ~2.5x lower latency [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on PR #1461: URL: https://github.com/apache/datafusion-ballista/pull/1461#issuecomment-3919607092 > https://private-user-images.githubusercontent.com/3322938/551420965-8acd63cd-bb2d-4733-83e6-dd2a2a8319be.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIu

Re: [PR] feat: Make push scheduling policy default as it has ~2.5x lower latency [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on PR #1461: URL: https://github.com/apache/datafusion-ballista/pull/1461#issuecomment-3919608665 > Quick search shows old default still on this file though : https://github.com/killzoner/datafusion-ballista/blob/issue-1395/ballista/scheduler/scheduler_config_spec.toml

Re: [PR] feat: Make push scheduling policy default as it has ~2.5x lower latency [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on PR #1461: URL: https://github.com/apache/datafusion-ballista/pull/1461#issuecomment-3919652532 I believe sleep at https://github.com/milenkovicm/datafusion-ballista/blob/0e1bcd278e58ad81f98dcb5cec8df8628331b54a/ballista/executor/src/execution_loop.rs#L205-L206 does

Re: [PR] MSSQL: Add support for WAITFOR statement [datafusion-sqlparser-rs]

2026-02-18 Thread via GitHub
iffyio commented on code in PR #2210: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2210#discussion_r2821252792 ## tests/sqlparser_mssql.rs: ## @@ -1702,6 +1702,43 @@ fn test_parse_throw() { ); } +#[test] +fn test_parse_waitfor() { +// WAITFOR DELAY +

Re: [PR] feat: add `Dataframe.cache()` factory (no planner handling) [datafusion-ballista]

2026-02-18 Thread via GitHub
killzoner commented on code in PR #1420: URL: https://github.com/apache/datafusion-ballista/pull/1420#discussion_r2821295163 ## ballista/core/src/serde/mod.rs: ## @@ -186,17 +191,63 @@ impl LogicalExtensionCodec for BallistaLogicalExtensionCodec { &self, buf:

[PR] fix: remove `scheduler_config_spec.toml` as its unused [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm opened a new pull request, #1462: URL: https://github.com/apache/datafusion-ballista/pull/1462 # Which issue does this PR close? Closes #. # Rationale for this change we have removed option to configure ballista using toml files (dependencies was unmaintaine

Re: [PR] fix: remove `scheduler_config_spec.toml` as its unused [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on PR #1462: URL: https://github.com/apache/datafusion-ballista/pull/1462#issuecomment-3919639089 cc @killzoner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] feat: Make push scheduling policy default as it has ~2.5x lower latency [datafusion-ballista]

2026-02-18 Thread via GitHub
milenkovicm commented on PR #1461: URL: https://github.com/apache/datafusion-ballista/pull/1461#issuecomment-3919640927 > Quick search shows old default still on this file though : https://github.com/killzoner/datafusion-ballista/blob/issue-1395/ballista/scheduler/scheduler_config_spec.toml

Re: [PR] feat: Make push scheduling policy default as it has ~2.5x lower latency [datafusion-ballista]

2026-02-18 Thread via GitHub
killzoner commented on PR #1461: URL: https://github.com/apache/datafusion-ballista/pull/1461#issuecomment-3919754197 > I believe sleep at https://github.com/milenkovicm/datafusion-ballista/blob/0e1bcd278e58ad81f98dcb5cec8df8628331b54a/ballista/executor/src/execution_loop.rs#L205-L206 does

Re: [PR] feat: Make push scheduling policy default as it has ~2.5x lower latency [datafusion-ballista]

2026-02-18 Thread via GitHub
killzoner commented on code in PR #1461: URL: https://github.com/apache/datafusion-ballista/pull/1461#discussion_r2821824193 ## ballista/executor/src/config.rs: ## @@ -78,7 +78,7 @@ pub struct Config { )] pub concurrent_tasks: usize, /// Task scheduling policy: p

Re: [PR] feat: Make push scheduling policy default as it has ~2.5x lower latency [datafusion-ballista]

2026-02-18 Thread via GitHub
killzoner commented on code in PR #1461: URL: https://github.com/apache/datafusion-ballista/pull/1461#discussion_r2821824193 ## ballista/executor/src/config.rs: ## @@ -78,7 +78,7 @@ pub struct Config { )] pub concurrent_tasks: usize, /// Task scheduling policy: p

  1   2   3   >