Re: [PR] feat: add granular repartition metrics [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #21152: URL: https://github.com/apache/datafusion/pull/21152#issuecomment-4131817130 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21152#issuecomment-4131756738) Details ``` Comparing HEAD and gene.bordegaray

Re: [PR] feat: add granular repartition metrics [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #21152: URL: https://github.com/apache/datafusion/pull/21152#issuecomment-4131839205 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21152#issuecomment-4131756738) Details ``` Comparing HEAD and gene.bordegaray

Re: [PR] feat: add granular repartition metrics [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #21152: URL: https://github.com/apache/datafusion/pull/21152#issuecomment-4131765173 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21152#issuecomment-4131756738) `Linux bench-c4131756738-554-2ghhk 6.12.55+ #1 SMP Sun Feb 1 08:5

Re: [PR] feat: add granular repartition metrics [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #21152: URL: https://github.com/apache/datafusion/pull/21152#issuecomment-4131765305 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21152#issuecomment-4131756738) `Linux bench-c4131756738-555-wj4xx 6.12.55+ #1 SMP Sun Feb 1 08:5

Re: [PR] feat: add granular repartition metrics [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #21152: URL: https://github.com/apache/datafusion/pull/21152#issuecomment-4131845797 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21152#issuecomment-4131756738) Details ``` Comparing HEAD and gene.bordegaray

Re: [PR] feat: add granular repartition metrics [datafusion]

2026-03-25 Thread via GitHub
Dandandan commented on PR #21152: URL: https://github.com/apache/datafusion/pull/21152#issuecomment-4131756738 run benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: add granular repartition metrics [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #21152: URL: https://github.com/apache/datafusion/pull/21152#issuecomment-4131765616 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21152#issuecomment-4131756738) `Linux bench-c4131756738-556-9gs2g 6.12.55+ #1 SMP Sun Feb 1 08:5

Re: [PR] feat: Support Job-Level Dependency with Async Job Activation [datafusion-ballista]

2026-03-25 Thread via GitHub
danielhumanmod commented on PR #1428: URL: https://github.com/apache/datafusion-ballista/pull/1428#issuecomment-4131355877 I tried a prototype locally, the e2e data flow will looks like: ``` EXPLAIN ANALYZE │ ▼ planner.rs LogicalPlan::Analyze detected

Re: [PR] feat: Support Spark expression seconds_of_time [datafusion-comet]

2026-03-25 Thread via GitHub
0lai0 commented on PR #3618: URL: https://github.com/apache/datafusion-comet/pull/3618#issuecomment-4131356088 Hi @andygrove , could you please take a look when your schedule permits? Thanks : ) -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] bench: improve Iceberg TPC workflow and plan capture [datafusion-comet]

2026-03-25 Thread via GitHub
Shekharrajak commented on PR #3783: URL: https://github.com/apache/datafusion-comet/pull/3783#issuecomment-4131987552 Improvement results in TPC DS dataset: https://github.com/user-attachments/assets/d06f0aaa-c4e4-42c6-8d4f-34eca6bd85cb"; /> -- This is an automated message fr

[PR] Feat ffi physical optimizer rule [datafusion]

2026-03-25 Thread via GitHub
coderfender opened a new pull request, #21166: URL: https://github.com/apache/datafusion/pull/21166 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/20450 ## Rationale for this change ## What changes are included in t

Re: [I] Automated way to run benchmarks on a dedicated machine from PRs [datafusion]

2026-03-25 Thread via GitHub
adriangb commented on issue #18115: URL: https://github.com/apache/datafusion/issues/18115#issuecomment-4131068399 We've now been using https://github.com/adriangb/datafusion-benchmarking for a bit which runs on GKE Autopilot Performance class pods. The results seem pretty stable. -- Thi

Re: [PR] feat: Support Job-Level Dependency with Async Job Activation [datafusion-ballista]

2026-03-25 Thread via GitHub
danielhumanmod commented on PR #1428: URL: https://github.com/apache/datafusion-ballista/pull/1428#issuecomment-4130960540 > thanks @danielhumanmod, first of all sorry for late review. > > I'm a bit puzzled which approach would be the best approach for this. > > I'm not sure if

Re: [PR] [branch-53] Substrait join consumer should not merge nullability of join keys (#21121) [datafusion]

2026-03-25 Thread via GitHub
comphead merged PR #21162: URL: https://github.com/apache/datafusion/pull/21162 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Improvement: keep order-preserving repartitions for streaming aggregates [datafusion]

2026-03-25 Thread via GitHub
xudong963 merged PR #21107: URL: https://github.com/apache/datafusion/pull/21107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] chore: Optimize schema rewriter usages [datafusion]

2026-03-25 Thread via GitHub
comphead merged PR #21158: URL: https://github.com/apache/datafusion/pull/21158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: Optimize schema rewriter usages [datafusion]

2026-03-25 Thread via GitHub
comphead commented on PR #21158: URL: https://github.com/apache/datafusion/pull/21158#issuecomment-4131402629 I was able to see 10% gain, not very impressive in terms of entire heavyweight test. Tbh I was expecting more and it is difficult to get accurate numbers as there multiple proce

Re: [I] `native_datafusion` performance improvement [datafusion-comet]

2026-03-25 Thread via GitHub
comphead commented on issue #3748: URL: https://github.com/apache/datafusion-comet/issues/3748#issuecomment-4131407724 Part of it https://github.com/apache/datafusion/pull/21158 which bypasses schema adapter rewrite costs if there is no need in rewrite(like no filter involved and logical/p

Re: [PR] feat: Support Spark expression seconds_of_time [datafusion-comet]

2026-03-25 Thread via GitHub
andygrove commented on PR #3618: URL: https://github.com/apache/datafusion-comet/pull/3618#issuecomment-4131446720 @0lai0 could you fix the compilation issue. thx. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] feat: Support Spark expression window_time [datafusion-comet]

2026-03-25 Thread via GitHub
andygrove commented on PR #3732: URL: https://github.com/apache/datafusion-comet/pull/3732#issuecomment-4131451235 @0lai0 could you fix lint issue, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] fix: Df int timestamp cast fix failing CI [datafusion]

2026-03-25 Thread via GitHub
martin-g merged PR #21163: URL: https://github.com/apache/datafusion/pull/21163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[I] [DRAFT, EPIC] Benchmark improvements [datafusion]

2026-03-25 Thread via GitHub
adriangb opened a new issue, #21165: URL: https://github.com/apache/datafusion/issues/21165 I'm opening this epic to track improvements / changes we want to our benchmarking setup. I'll start by collecting some relevant issues: - https://github.com/apache/datafusion/issues/15511

Re: [I] Run benchmarks triggered by CI comment [datafusion]

2026-03-25 Thread via GitHub
adriangb closed issue #15583: Run benchmarks triggered by CI comment URL: https://github.com/apache/datafusion/issues/15583 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Run benchmarks triggered by CI comment [datafusion]

2026-03-25 Thread via GitHub
adriangb commented on issue #15583: URL: https://github.com/apache/datafusion/issues/15583#issuecomment-4131058935 We now have this. The current iteration is at https://github.com/adriangb/datafusion-benchmarking -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] dorny: rs change [datafusion-sandbox]

2026-03-25 Thread via GitHub
github-actions[bot] commented on PR #143: URL: https://github.com/apache/datafusion-sandbox/pull/143#issuecomment-4131079214 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or commen

Re: [PR] dorny: md change [datafusion-sandbox]

2026-03-25 Thread via GitHub
github-actions[bot] commented on PR #142: URL: https://github.com/apache/datafusion-sandbox/pull/142#issuecomment-4131079300 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or commen

Re: [PR] chore(deps): bump sphinx-reredirects from 1.0.0 to 1.1.0 in /docs [datafusion-sandbox]

2026-03-25 Thread via GitHub
github-actions[bot] commented on PR #102: URL: https://github.com/apache/datafusion-sandbox/pull/102#issuecomment-4131079412 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or commen

Re: [I] Pluggable expression-level statistics estimation (ExpressionAnalyzer) [datafusion]

2026-03-25 Thread via GitHub
2010YOUY01 commented on issue #21120: URL: https://github.com/apache/datafusion/issues/21120#issuecomment-4131266771 > This proposal is, however, a little different from the other efforts tracked by existing epics like [#8227](https://github.com/apache/datafusion/issues/8227) and [#20766](

Re: [PR] feat: add granular repartition metrics [datafusion]

2026-03-25 Thread via GitHub
2010YOUY01 commented on PR #21152: URL: https://github.com/apache/datafusion/pull/21152#issuecomment-4131086948 I have some concerns about these low-level (kernel-profiling) metrics, so I’m sharing a few suggestions. (Not trying to block this, given this is useful to solve real problems—jus

Re: [PR] fix: Convert Spark columnar batches to Arrow in CometNativeWriteExec … [datafusion-comet]

2026-03-25 Thread via GitHub
github-actions[bot] closed pull request #3075: fix: Convert Spark columnar batches to Arrow in CometNativeWriteExec … URL: https://github.com/apache/datafusion-comet/pull/3075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] feat: Support Spark expression seconds_of_time [datafusion-comet]

2026-03-25 Thread via GitHub
0lai0 commented on PR #3618: URL: https://github.com/apache/datafusion-comet/pull/3618#issuecomment-4131455885 Sure, thank for running CI test. I'll fix it right away. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] feat: Support Spark expression seconds_of_time [datafusion-comet]

2026-03-25 Thread via GitHub
andygrove commented on PR #3618: URL: https://github.com/apache/datafusion-comet/pull/3618#issuecomment-4131545302 SecondsOfTime is new in Spark 4.1.0 as far as I know, and Comet does not support Spark 4.1.0 yet -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] feat: Support Spark expression window_time [datafusion-comet]

2026-03-25 Thread via GitHub
0lai0 commented on PR #3732: URL: https://github.com/apache/datafusion-comet/pull/3732#issuecomment-4131565159 Apologies for the oversight—I inadvertently removed a few lines while resolving merge conflicts, which triggered the error. I will take a look and push a fix to resolve it. Thanks

Re: [D] DISCUSSION: Seattle DataFusion Meetup (April 23, 2026) [datafusion]

2026-03-25 Thread via GitHub
GitHub user jiayuasu added a comment to the discussion: DISCUSSION: Seattle DataFusion Meetup (April 23, 2026) @alamb Hey Andrew, could you update the Luma event page to add my name and presentation there? Thank you! 😁 GitHub link: https://github.com/apache/datafusion/discussions/13500#discu

Re: [PR] fix: Df int timestamp cast fix failing CI [datafusion]

2026-03-25 Thread via GitHub
coderfender commented on PR #21163: URL: https://github.com/apache/datafusion/pull/21163#issuecomment-4130834067 Fixed test failure by resetting the timezone (perhaps this check got merged to main between original PR approval and merge) -- This is an automated message from the Apache Git

Re: [PR] fix: Df int timestamp cast fix failing CI [datafusion]

2026-03-25 Thread via GitHub
coderfender commented on PR #21163: URL: https://github.com/apache/datafusion/pull/21163#issuecomment-4130843252 @martin-g , @neilconway , @alamb Please take a look and merge the PR to fix CI issue. Only change is to reset the timezone after the tests run -- This is an automated messag

Re: [PR] Add metric category filtering for EXPLAIN ANALYZE [datafusion]

2026-03-25 Thread via GitHub
2010YOUY01 commented on code in PR #21160: URL: https://github.com/apache/datafusion/pull/21160#discussion_r2991955660 ## datafusion/common/src/format.rs: ## @@ -206,6 +206,142 @@ impl ConfigField for ExplainFormat { } } +/// Classifies a metric by what it measures. +///

Re: [PR] feat : support spark compatible int to timestamp cast [datafusion]

2026-03-25 Thread via GitHub
coderfender commented on PR #20555: URL: https://github.com/apache/datafusion/pull/20555#issuecomment-4130837438 Raised PR to fix CI : https://github.com/apache/datafusion/pull/20555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] fix: Df int timestamp cast fix failing CI [datafusion]

2026-03-25 Thread via GitHub
coderfender commented on PR #21163: URL: https://github.com/apache/datafusion/pull/21163#issuecomment-4130836841 https://github.com/apache/datafusion/pull/20555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Fix/aggregate output ordering streaming [datafusion]

2026-03-25 Thread via GitHub
xudong963 commented on code in PR #21107: URL: https://github.com/apache/datafusion/pull/21107#discussion_r2986959269 ## datafusion/physical-optimizer/src/enforce_distribution.rs: ## @@ -928,6 +928,47 @@ fn add_hash_on_top( /// /// * `input`: Current node. /// +/// Checks whe

Re: [PR] [branch-52] Fix push_down_filter for children with non-empty fetch fields (#21057) [datafusion]

2026-03-25 Thread via GitHub
hareshkh commented on code in PR #21141: URL: https://github.com/apache/datafusion/pull/21141#discussion_r2984925611 ## datafusion/physical-plan/src/filter_pushdown.rs: ## @@ -359,6 +358,17 @@ impl ChildFilterDescription { }) } +/// Mark all parent filters as

Re: [PR] Update to arrow/parquet `58.1.0` [datafusion]

2026-03-25 Thread via GitHub
alamb merged PR #21044: URL: https://github.com/apache/datafusion/pull/21044 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] sqllogictests fail in debug mode (join_full_multi_batch::batch_size_4_2) [datafusion]

2026-03-25 Thread via GitHub
alamb closed issue #20689: sqllogictests fail in debug mode (join_full_multi_batch::batch_size_4_2) URL: https://github.com/apache/datafusion/issues/20689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Update to arrow/parquet `58.1.0` [datafusion]

2026-03-25 Thread via GitHub
alamb commented on PR #21044: URL: https://github.com/apache/datafusion/pull/21044#issuecomment-4125141048 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] fix(stats): widen sum_value integer arithmetic to SUM-compatible types [datafusion]

2026-03-25 Thread via GitHub
kumarUjjawal commented on PR #20865: URL: https://github.com/apache/datafusion/pull/20865#issuecomment-4125150316 @alamb this is good to go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Debug assertions currently failing on `main` [datafusion]

2026-03-25 Thread via GitHub
alamb closed issue #20831: Debug assertions currently failing on `main` URL: https://github.com/apache/datafusion/issues/20831 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] [bench] Disable prefetch morsel [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #21143: URL: https://github.com/apache/datafusion/pull/21143#issuecomment-4124547504 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21143#issuecomment-4124473425) Details ``` Comparing HEAD and disable-prefetc

Re: [PR] [bench] Disable prefetch morsel [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #21143: URL: https://github.com/apache/datafusion/pull/21143#issuecomment-4124573252 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21143#issuecomment-4124473425) Details ``` Comparing HEAD and disable-prefetc

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-25 Thread via GitHub
Dandandan commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4124670369 run benchmark tpch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] fix: avoid internal errors for OneOf signature mismatches [datafusion]

2026-03-25 Thread via GitHub
myandpr commented on code in PR #21032: URL: https://github.com/apache/datafusion/pull/21032#discussion_r2986600430 ## datafusion/sqllogictest/test_files/spark/math/hex.slt: ## @@ -56,7 +56,7 @@ SELECT hex(column1) FROM VALUES (arrow_cast('hello', 'LargeBinary')), (NULL), (a N

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4124686000 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/20820#issuecomment-4124670369) `Linux bench-c4124670369-531-jkfbh 6.12.55+ #1 SMP Sun Feb 1 08:5

Re: [PR] fix: avoid internal errors for OneOf signature mismatches [datafusion]

2026-03-25 Thread via GitHub
myandpr commented on code in PR #21032: URL: https://github.com/apache/datafusion/pull/21032#discussion_r2986600430 ## datafusion/sqllogictest/test_files/spark/math/hex.slt: ## @@ -56,7 +56,7 @@ SELECT hex(column1) FROM VALUES (arrow_cast('hello', 'LargeBinary')), (NULL), (a N

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-25 Thread via GitHub
Dandandan commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4124676929 run benchmark clickbench_partitioned -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] fix: avoid internal errors for OneOf signature mismatches [datafusion]

2026-03-25 Thread via GitHub
myandpr commented on code in PR #21032: URL: https://github.com/apache/datafusion/pull/21032#discussion_r2986597589 ## datafusion/expr/src/type_coercion/functions.rs: ## @@ -1223,6 +1227,96 @@ mod tests { Ok(()) } +#[test] +fn test_one_of_uses_generic_pla

Re: [PR] fix: avoid internal errors for OneOf signature mismatches [datafusion]

2026-03-25 Thread via GitHub
myandpr commented on code in PR #21032: URL: https://github.com/apache/datafusion/pull/21032#discussion_r2986595893 ## datafusion/expr/src/type_coercion/functions.rs: ## @@ -1223,6 +1227,96 @@ mod tests { Ok(()) } +#[test] +fn test_one_of_uses_generic_pla

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4124695215 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/20820#issuecomment-4124676929) `Linux bench-c4124676929-532-kvxkm 6.12.55+ #1 SMP Sun Feb 1 08:5

Re: [PR] chore(deps): bump taiki-e/install-action from 2.68.34 to 2.69.7 [datafusion]

2026-03-25 Thread via GitHub
martin-g merged PR #21133: URL: https://github.com/apache/datafusion/pull/21133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore(deps): bump github/codeql-action from 4.33.0 to 4.34.1 [datafusion]

2026-03-25 Thread via GitHub
martin-g merged PR #21132: URL: https://github.com/apache/datafusion/pull/21132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[I] Release DataFusion-Python 53.0.0 [datafusion-python]

2026-03-25 Thread via GitHub
nuno-faria opened a new issue, #1442: URL: https://github.com/apache/datafusion-python/issues/1442 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Publish the next version of `datafusion-python` now that `datafusion-53` is out

[PR] feat: Add support for Spark Acosh, Asinh, Atanh math expressions [datafusion-comet]

2026-03-25 Thread via GitHub
rafafrdz opened a new pull request, #3787: URL: https://github.com/apache/datafusion-comet/pull/3787 ## Summary - Add support for Spark's `Acosh`, `Asinh`, and `Atanh` inverse hyperbolic trigonometric expressions - Delegates to DataFusion's built-in `acosh`, `asinh`, `atanh` functions

[PR] feat: Add support for Spark Pi math expression [datafusion-comet]

2026-03-25 Thread via GitHub
rafafrdz opened a new pull request, #3789: URL: https://github.com/apache/datafusion-comet/pull/3789 ## Summary - Add support for Spark's `Pi` constant expression - Delegates to DataFusion's built-in `pi()` function via `CometScalarFunction` - No native Rust code needed — DataFusion

[PR] feat: Add support for Spark Cbrt math expression [datafusion-comet]

2026-03-25 Thread via GitHub
rafafrdz opened a new pull request, #3788: URL: https://github.com/apache/datafusion-comet/pull/3788 ## Summary - Add support for Spark's `Cbrt` (cube root) expression - Delegates to DataFusion's built-in `cbrt` function via `CometScalarFunction` - No native Rust code needed — DataF

[PR] feat: Add support for Spark NaNvl math expression [datafusion-comet]

2026-03-25 Thread via GitHub
rafafrdz opened a new pull request, #3790: URL: https://github.com/apache/datafusion-comet/pull/3790 ## Summary - Add support for Spark's `NaNvl` expression (returns first arg if not NaN, otherwise second arg) - Delegates to DataFusion's built-in `nanvl` function via `CometScalarFunct

Re: [PR] Fix/aggregate output ordering streaming [datafusion]

2026-03-25 Thread via GitHub
xudong963 commented on PR #21107: URL: https://github.com/apache/datafusion/pull/21107#issuecomment-4125197296 @alamb thanks for the review, this PR is not trading parallelism for sortedness, and it does not remove the hash repartition. The plan still uses the same Hash repartition wi

Re: [PR] feat[substrait]: translate row_count statistics via RelCommon hint [datafusion]

2026-03-25 Thread via GitHub
etiennepelissier commented on code in PR #21112: URL: https://github.com/apache/datafusion/pull/21112#discussion_r2987878229 ## datafusion/substrait/src/logical_plan/consumer/rel/read_rel.rs: ## Review Comment: The final API evolved further: the parameter is now `hints: Sub

Re: [PR] Migrate Avro reader to arrow-avro and remove internal conversion code [datafusion]

2026-03-25 Thread via GitHub
getChan commented on code in PR #17861: URL: https://github.com/apache/datafusion/pull/17861#discussion_r2988124881 ## datafusion/datasource-avro/src/source.rs: ## @@ -56,22 +57,83 @@ impl AvroSource { } } -fn open(&self, reader: R) -> Result> { +fn open(

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-25 Thread via GitHub
alamb commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4126547078 Oh boy, the tests pass and the perf looks good. I am working a little bit more on error handling/testing now. Then I'll start splitting it up into pieces -- This is an automated mess

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4126564337 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/20820#issuecomment-4126544522) `Linux bench-c4126544522-540-g96gq 6.12.55+ #1 SMP Sun Feb 1 08:5

[PR] ci: restrict number of jobs during build stage [datafusion-python]

2026-03-25 Thread via GitHub
timsaucer opened a new pull request, #1443: URL: https://github.com/apache/datafusion-python/pull/1443 # Which issue does this PR close? Related to https://github.com/apache/datafusion-python/issues/1429 but we need to verify if it resolves the issue. # Rationale for this chan

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-25 Thread via GitHub
alamb commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4126544522 run benchmark tpch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: support GroupsAccumulator for first_value and last_value with string/binary types [datafusion]

2026-03-25 Thread via GitHub
UBarney commented on code in PR #21090: URL: https://github.com/apache/datafusion/pull/21090#discussion_r2988238458 ## datafusion/functions-aggregate/src/first_last/state.rs: ## @@ -0,0 +1,439 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4126627633 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/20820#issuecomment-4126544522) Details ``` Comparing HEAD and alamb_morsel_ap

Re: [PR] perf: Optimize `split_part`, support `Utf8View` [datafusion]

2026-03-25 Thread via GitHub
neilconway commented on code in PR #21119: URL: https://github.com/apache/datafusion/pull/21119#discussion_r2988242624 ## datafusion/functions/src/string/split_part.rs: ## @@ -123,71 +126,50 @@ impl ScalarUDFImpl for SplitPartFunc { // Unpack the ArrayRefs from the ar

[PR] feat(memory_pool): add `TrackConsumersPool::metrics()` to expose cons… [datafusion]

2026-03-25 Thread via GitHub
bert-beyondloops opened a new pull request, #21147: URL: https://github.com/apache/datafusion/pull/21147 ## Which issue does this PR close? - Closes #21146 ## Rationale for this change There is currently no way to programmatically inspect the memory consumption of in

Re: [PR] Migrate Avro reader to arrow-avro and remove internal conversion code [datafusion]

2026-03-25 Thread via GitHub
alamb commented on code in PR #17861: URL: https://github.com/apache/datafusion/pull/17861#discussion_r2988254405 ## datafusion/datasource-avro/src/source.rs: ## @@ -56,22 +57,83 @@ impl AvroSource { } } -fn open(&self, reader: R) -> Result> { +fn open( +

Re: [PR] feat: add approx_top_k aggregate function [datafusion]

2026-03-25 Thread via GitHub
sesteves commented on code in PR #20968: URL: https://github.com/apache/datafusion/pull/20968#discussion_r2987845664 ## datafusion/functions-aggregate/src/approx_top_k.rs: ## @@ -0,0 +1,1381 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-25 Thread via GitHub
alamb commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4126374517 run benchmark clickbench_partitioned -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Add example implementing filter pushdown [datafusion]

2026-03-25 Thread via GitHub
alamb commented on issue #21145: URL: https://github.com/apache/datafusion/issues/21145#issuecomment-4126139041 Note the example on https://github.com/apache/datafusion-site/pull/161 shows the setup needed, but the idea of this example would be to show how to use pruning prediate / gaurante

Re: [PR] Add end-to-end Parquet tests for List and LargeList struct schema evolution [datafusion]

2026-03-25 Thread via GitHub
alamb commented on code in PR #20840: URL: https://github.com/apache/datafusion/pull/20840#discussion_r2987983682 ## datafusion/core/tests/parquet/expr_adapter.rs: ## @@ -54,6 +56,399 @@ async fn write_parquet(batch: RecordBatch, store: Arc, path: &s store.put(&Path::from(p

Re: [PR] Add end-to-end Parquet tests for List and LargeList struct schema evolution [datafusion]

2026-03-25 Thread via GitHub
alamb commented on code in PR #20840: URL: https://github.com/apache/datafusion/pull/20840#discussion_r2988000406 ## datafusion/core/tests/parquet/expr_adapter.rs: ## @@ -54,6 +56,399 @@ async fn write_parquet(batch: RecordBatch, store: Arc, path: &s store.put(&Path::from(p

[I] Expose TrackConsumersPool memory consumer metrics programmatically [datafusion]

2026-03-25 Thread via GitHub
bert-beyondloops opened a new issue, #21146: URL: https://github.com/apache/datafusion/issues/21146 ### Is your feature request related to a problem or challenge? There is currently no way to programmatically inspect the memory consumption of individual consumers tracked by TrackConsu

Re: [PR] Migrate Avro reader to arrow-avro and remove internal conversion code [datafusion]

2026-03-25 Thread via GitHub
getChan commented on code in PR #17861: URL: https://github.com/apache/datafusion/pull/17861#discussion_r2988521468 ## datafusion/datasource-avro/src/source.rs: ## @@ -56,22 +57,83 @@ impl AvroSource { } } -fn open(&self, reader: R) -> Result> { +fn open(

Re: [PR] Migrate Avro reader to arrow-avro and remove internal conversion code [datafusion]

2026-03-25 Thread via GitHub
getChan commented on code in PR #17861: URL: https://github.com/apache/datafusion/pull/17861#discussion_r2988521468 ## datafusion/datasource-avro/src/source.rs: ## @@ -56,22 +57,83 @@ impl AvroSource { } } -fn open(&self, reader: R) -> Result> { +fn open(

Re: [PR] Migrate Avro reader to arrow-avro and remove internal conversion code [datafusion]

2026-03-25 Thread via GitHub
adriangb commented on code in PR #17861: URL: https://github.com/apache/datafusion/pull/17861#discussion_r2988569258 ## datafusion/datasource-avro/src/source.rs: ## @@ -56,22 +57,83 @@ impl AvroSource { } } -fn open(&self, reader: R) -> Result> { +fn open

Re: [PR] feat[substrait]: translate scan statistics (row_count, record_size) via RelCommon hints [datafusion]

2026-03-25 Thread via GitHub
etiennepelissier commented on code in PR #21112: URL: https://github.com/apache/datafusion/pull/21112#discussion_r2988571789 ## datafusion/substrait/src/logical_plan/consumer/rel/read_rel.rs: ## Review Comment: To close the loop: the final API has `resolve_table_ref` taking

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-25 Thread via GitHub
Dandandan commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4127000112 run benchmark tpch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] fix: Prefer numeric in type coercion for comparisons [datafusion]

2026-03-25 Thread via GitHub
adriangb commented on PR #20426: URL: https://github.com/apache/datafusion/pull/20426#issuecomment-4127017141 I have not heard any push back or alternative proposals in [Discord](https://discord.com/channels/885562378132000778/1166447479609376850/1474493692118302934) or the [mailing list](

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4127015949 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/20820#issuecomment-4127000112) `Linux bench-c4127000112-541-9v9xj 6.12.55+ #1 SMP Sun Feb 1 08:5

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-25 Thread via GitHub
Dandandan commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4127025615 run benchmark tpch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4127048283 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/20820#issuecomment-4127025615) `Linux bench-c4127025615-542-vtl9j 6.12.55+ #1 SMP Sun Feb 1 08:5

Re: [PR] [bench] Disable prefetch morsel [datafusion]

2026-03-25 Thread via GitHub
Dandandan commented on PR #21143: URL: https://github.com/apache/datafusion/pull/21143#issuecomment-4124473425 run benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [bench] Disable prefetch morsel [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #21143: URL: https://github.com/apache/datafusion/pull/21143#issuecomment-4124486471 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21143#issuecomment-4124473425) `Linux bench-c4124473425-529-4tktc 6.12.55+ #1 SMP Sun Feb 1 08:5

Re: [PR] [bench] Disable prefetch morsel [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #21143: URL: https://github.com/apache/datafusion/pull/21143#issuecomment-4124486473 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21143#issuecomment-4124473425) `Linux bench-c4124473425-528-bjtpk 6.12.55+ #1 SMP Sun Feb 1 08:5

Re: [PR] [bench] Disable prefetch morsel [datafusion]

2026-03-25 Thread via GitHub
adriangbot commented on PR #21143: URL: https://github.com/apache/datafusion/pull/21143#issuecomment-4124487425 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21143#issuecomment-4124473425) `Linux bench-c4124473425-530-52gfj 6.12.55+ #1 SMP Sun Feb 1 08:5

Re: [PR] fix: handle inf/-inf in ShimSparkErrorConverter cast overflow [datafusion-comet]

2026-03-25 Thread via GitHub
manuzhang commented on code in PR #3768: URL: https://github.com/apache/datafusion-comet/pull/3768#discussion_r2987575870 ## spark/src/main/spark-3.4/org/apache/spark/sql/comet/shims/ShimSparkErrorConverter.scala: ## @@ -44,6 +44,24 @@ trait ShimSparkErrorConverter { private

Re: [PR] chore(deps): bump taiki-e/install-action from 2.67.13 to 2.68.35 [datafusion-sandbox]

2026-03-25 Thread via GitHub
dependabot[bot] closed pull request #192: chore(deps): bump taiki-e/install-action from 2.67.13 to 2.68.35 URL: https://github.com/apache/datafusion-sandbox/pull/192 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] fix: Tuple IN null semantics for struct comparisons [datafusion]

2026-03-25 Thread via GitHub
xiedeyantu commented on PR #21054: URL: https://github.com/apache/datafusion/pull/21054#issuecomment-4125895792 > > This clarity will help manage user expectations and prevent confusion similar to the issues discussed in the DuckDB threads. > > Improving the documentation sounds like

[PR] chore(deps): bump taiki-e/install-action from 2.67.13 to 2.69.9 [datafusion-sandbox]

2026-03-25 Thread via GitHub
dependabot[bot] opened a new pull request, #194: URL: https://github.com/apache/datafusion-sandbox/pull/194 Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.67.13 to 2.69.9. Release notes Sourced from https://github.com/taiki-e/install-action/release

Re: [PR] chore(deps): bump taiki-e/install-action from 2.67.13 to 2.69.9 [datafusion-sandbox]

2026-03-25 Thread via GitHub
dependabot[bot] commented on PR #194: URL: https://github.com/apache/datafusion-sandbox/pull/194#issuecomment-4125885722 ### Labels The following labels could not be found: `auto-dependencies`. Please create it before Dependabot can add it to a pull request. Please fix the

Re: [PR] chore(deps): bump taiki-e/install-action from 2.67.13 to 2.68.35 [datafusion-sandbox]

2026-03-25 Thread via GitHub
dependabot[bot] commented on PR #192: URL: https://github.com/apache/datafusion-sandbox/pull/192#issuecomment-4125885825 Superseded by #194. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

  1   2   3   4   >