Re: [I] Blog post about 1000 distinct committers / history of the project [datafusion]

2026-04-07 Thread via GitHub
alamb commented on issue #21305: URL: https://github.com/apache/datafusion/issues/21305#issuecomment-4202864071 I suggest using `git` directly to count contributors if we want to pubish numbers -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] fix: Make cast string to timestamp compatible with Spark [datafusion-comet]

2026-04-07 Thread via GitHub
parthchandra merged PR #3884: URL: https://github.com/apache/datafusion-comet/pull/3884 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] fix: Make cast string to timestamp compatible with Spark [datafusion-comet]

2026-04-07 Thread via GitHub
parthchandra commented on PR #3884: URL: https://github.com/apache/datafusion-comet/pull/3884#issuecomment-4203182004 Merged. Thank you @kazuyukitanimura !! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] Regression in json performance for local files [datafusion]

2026-04-07 Thread via GitHub
alamb commented on issue #21450: URL: https://github.com/apache/datafusion/issues/21450#issuecomment-4202879078 Thanks @ariel-miculas -- I think tince this is a regression we should make sure it is done before we release 54.0.0 -- I added it to the list on - https://github.com/apache/da

Re: [I] Json support in clickbench benchmark [datafusion]

2026-04-07 Thread via GitHub
alamb commented on issue #21446: URL: https://github.com/apache/datafusion/issues/21446#issuecomment-4202882248 Yes I agree this would be very useful. Thanks @ariel-miculas -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] [branch-53] fix: use datafusion_expr instead of datafusion crate in spark bitmap/… [datafusion]

2026-04-07 Thread via GitHub
comphead opened a new pull request, #21452: URL: https://github.com/apache/datafusion/pull/21452 …math modules (cherry picked from commit 39fb9cca79db1a79e6f8ee01af79df9b59a8ec00) ## Which issue does this PR close? - Closes #. ## Rationale for this change

Re: [I] Release DataFusion `53.1.0` (minor) (Apr 2026) [datafusion]

2026-04-07 Thread via GitHub
comphead commented on issue #21079: URL: https://github.com/apache/datafusion/issues/21079#issuecomment-4203114599 #20900 was applied before, prob its missing from change log #21293 backport https://github.com/apache/datafusion/pull/21451 #21043 backport https://github.com/apache/datafu

Re: [PR] fix: Iceberg reflection for current() on TableOperations hierarchy [datafusion-comet]

2026-04-07 Thread via GitHub
parthchandra commented on code in PR #3895: URL: https://github.com/apache/datafusion-comet/pull/3895#discussion_r3048736920 ## spark/src/main/scala/org/apache/comet/iceberg/IcebergReflection.scala: ## @@ -228,11 +228,18 @@ object IcebergReflection extends Logging { v

Re: [PR] fix: Iceberg reflection for current() on TableOperations hierarchy [datafusion-comet]

2026-04-07 Thread via GitHub
karuppayya commented on PR #3895: URL: https://github.com/apache/datafusion-comet/pull/3895#issuecomment-4202778821 cc: @andygrove @mbutrovich @parthchandra can any of you help review? Also looks like workflow needs maintainer approval to progress(But remember that it used to work w

Re: [PR] Estimate aggregate output rows using existing NDV statistics [datafusion]

2026-04-07 Thread via GitHub
2010YOUY01 commented on PR #20926: URL: https://github.com/apache/datafusion/pull/20926#issuecomment-4203083163 Great! Thanks everyone. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Estimate aggregate output rows using existing NDV statistics [datafusion]

2026-04-07 Thread via GitHub
2010YOUY01 merged PR #20926: URL: https://github.com/apache/datafusion/pull/20926 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] fix: preserve duplicate GROUPING SETS rows [datafusion]

2026-04-07 Thread via GitHub
xiedeyantu commented on PR #21058: URL: https://github.com/apache/datafusion/pull/21058#issuecomment-4202839556 @neilconway @alamb Thank you for the review! I've made revisions based on the comments. Please help me take another look. -- This is an automated message from the Apache Git Ser

[PR] [branch-53] fix: use spill writer's schema instead of the first batch schema for … [datafusion]

2026-04-07 Thread via GitHub
comphead opened a new pull request, #21451: URL: https://github.com/apache/datafusion/pull/21451 …spill files (cherry picked from commit e133dd3873a8a8ee9c57f977457e89037f992725) ## Which issue does this PR close? - Closes #. ## Rationale for this change

Re: [I] Propagate orderings through struct-producing projections [datafusion]

2026-04-07 Thread via GitHub
xudong963 closed issue #21217: Propagate orderings through struct-producing projections URL: https://github.com/apache/datafusion/issues/21217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] feat: Propagate orderings through struct-producing projections [datafusion]

2026-04-07 Thread via GitHub
xudong963 merged PR #21218: URL: https://github.com/apache/datafusion/pull/21218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[I] Enable test `cast nested ArrayType to nested ArrayType` [datafusion-comet]

2026-04-07 Thread via GitHub
manuzhang opened a new issue, #3906: URL: https://github.com/apache/datafusion-comet/issues/3906 `cast nested ArrayType to nested ArrayType` test is ignored in #2897 due to `java.lang.OutOfMemoryError: Java heap space` failure. We need to look into the root cause, fix it and enable the test

Re: [PR] Runs on step 1 asf [datafusion-sandbox]

2026-04-07 Thread via GitHub
github-actions[bot] commented on PR #163: URL: https://github.com/apache/datafusion-sandbox/pull/163#issuecomment-4203363761 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or commen

Re: [I] `approx_distinct` should be leveraging bitmap for counting u8/16 and i8/16 [datafusion]

2026-04-07 Thread via GitHub
coderfender commented on issue #1109: URL: https://github.com/apache/datafusion/issues/1109#issuecomment-4204026539 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] fix: Make cast string to timestamp compatible with Spark [datafusion-comet]

2026-04-07 Thread via GitHub
parthchandra commented on code in PR #3884: URL: https://github.com/apache/datafusion-comet/pull/3884#discussion_r3048389187 ## native/core/src/execution/planner.rs: ## @@ -406,7 +406,12 @@ impl PhysicalPlanner { Ok(Arc::new(Cast::new( child

Re: [PR] fix: preserve duplicate GROUPING SETS rows [datafusion]

2026-04-07 Thread via GitHub
neilconway commented on code in PR #21058: URL: https://github.com/apache/datafusion/pull/21058#discussion_r3048457709 ## datafusion/optimizer/src/analyzer/resolve_grouping_function.rs: ## @@ -184,40 +191,43 @@ fn validate_args( fn grouping_function_on_id( function: &Aggre

Re: [I] `ProjectionExec` produces unknown statistics for all `ScalarFunctionExpr` outputs [datafusion]

2026-04-07 Thread via GitHub
alamb commented on issue #21307: URL: https://github.com/apache/datafusion/issues/21307#issuecomment-4202868788 Maybe this is someting that @xudong963 or @jonathanc-n have some insight into We already have some notion of expression range analysis (that could be applied ot the problem

Re: [PR] feat: add BuildHasher variants for hash_utils [datafusion]

2026-04-07 Thread via GitHub
adriangbot commented on PR #21429: URL: https://github.com/apache/datafusion/pull/21429#issuecomment-4204261752 🤖 Criterion benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21429#issuecomment-4204210371) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Lin

Re: [PR] Follow-up: remove interleave panic recovery after Arrow 58.1.0 [datafusion]

2026-04-07 Thread via GitHub
xudong963 merged PR #21436: URL: https://github.com/apache/datafusion/pull/21436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Follow-up: remove interleave panic recovery after Arrow 58.1.0 [datafusion]

2026-04-07 Thread via GitHub
xudong963 commented on PR #21436: URL: https://github.com/apache/datafusion/pull/21436#issuecomment-4204269212 Thanks @alamb @kosiew -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] perf: Bitmap instead hll smaller int types [datafusion]

2026-04-07 Thread via GitHub
coderfender opened a new pull request, #21453: URL: https://github.com/apache/datafusion/pull/21453 ## Which issue does this PR close? - Closes #https://github.com/apache/datafusion/issues/1109 ## Rationale for this change ## What changes are included in

Re: [PR] perf: Bitmap instead hll smaller int types [datafusion]

2026-04-07 Thread via GitHub
coderfender commented on PR #21453: URL: https://github.com/apache/datafusion/pull/21453#issuecomment-4204329180 ```┌──┬─┬──┐ │ Type │ Change│ Verdict│ ├──┼─┼──┤ │ u8 │ +20% slower │Regressed │ ├

Re: [PR] feat: add BuildHasher variants for hash_utils [datafusion]

2026-04-07 Thread via GitHub
adriangbot commented on PR #21429: URL: https://github.com/apache/datafusion/pull/21429#issuecomment-4204351725 🤖 Criterion benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21429#issuecomment-4204210371) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)

Re: [PR] feat: support InSubquery and Exists in Projection expressions [datafusion]

2026-04-07 Thread via GitHub
neilconway commented on PR #21363: URL: https://github.com/apache/datafusion/pull/21363#issuecomment-4202849692 @crm26 Thanks for iterating on this! The comment I was suggesting you add "Optimization:" to was actually a different one :) My suggestion was attached to the "// Skip if no

Re: [I] `native_datafusion` doesn't use all available parallelism for scan [datafusion-comet]

2026-04-07 Thread via GitHub
comphead commented on issue #3817: URL: https://github.com/apache/datafusion-comet/issues/3817#issuecomment-4203253669 The workaround is to play `spark.sql.files.maxPartitionBytes` to 64M which is half of default value. Fewer this setting the fewer the difference between Spark/Comet numb

Re: [I] DataFusion support for `TimestampWithOffset` [datafusion]

2026-04-07 Thread via GitHub
coderfender commented on issue #21116: URL: https://github.com/apache/datafusion/issues/21116#issuecomment-4204145346 Ill take a stab at it soetime unless you are @LiaCastaneda ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] DataFusion support for `TimestampWithOffset` [datafusion]

2026-04-07 Thread via GitHub
coderfender commented on issue #21116: URL: https://github.com/apache/datafusion/issues/21116#issuecomment-4204143561 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] fix: skips projection pruning for whole subtree [datafusion]

2026-04-07 Thread via GitHub
Dandandan merged PR #20545: URL: https://github.com/apache/datafusion/pull/20545 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Bug: unnecessary columns projected and redundant filters pushed down [datafusion]

2026-04-07 Thread via GitHub
Dandandan closed issue #18816: Bug: unnecessary columns projected and redundant filters pushed down URL: https://github.com/apache/datafusion/issues/18816 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Regression in json performance for local files [datafusion]

2026-04-07 Thread via GitHub
Dandandan commented on issue #21450: URL: https://github.com/apache/datafusion/issues/21450#issuecomment-4204172933 This was _after_ https://github.com/apache/datafusion/pull/20823 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] [DISCUSSION] Future of Dynamic Filters Sync [datafusion]

2026-04-07 Thread via GitHub
stuhood commented on issue #21207: URL: https://github.com/apache/datafusion/issues/21207#issuecomment-4204179190 > These are some diagrams which expla @jayshrivastava : Can you post these? -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] feat: add BuildHasher variants for hash_utils [datafusion]

2026-04-07 Thread via GitHub
Dandandan commented on PR #21429: URL: https://github.com/apache/datafusion/pull/21429#issuecomment-4204210371 run benchmark with_hashes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] feat: add BuildHasher variants for hash_utils [datafusion]

2026-04-07 Thread via GitHub
Dandandan commented on PR #21429: URL: https://github.com/apache/datafusion/pull/21429#issuecomment-4204215638 ``` large_utf8: single, no nulls 1.00 26.5±0.09µs? ?/sec1.37 36.2±0.41µs? ?/sec sparse_union: multiple, no nulls 1.00

Re: [PR] perf: optimize object store requests when reading JSON [datafusion]

2026-04-07 Thread via GitHub
alamb commented on code in PR #20823: URL: https://github.com/apache/datafusion/pull/20823#discussion_r3046452282 ## datafusion/core/tests/datasource/object_store_access.rs: ## @@ -397,6 +400,348 @@ async fn query_partitioned_csv_file() { ); } +// ===

Re: [PR] perf: optimize object store requests when reading JSON [datafusion]

2026-04-07 Thread via GitHub
alamb merged PR #20823: URL: https://github.com/apache/datafusion/pull/20823 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Release DataFusion 52.5.0 (minor) Release (Apr 2026) [datafusion]

2026-04-07 Thread via GitHub
alamb commented on issue #21078: URL: https://github.com/apache/datafusion/issues/21078#issuecomment-4202230070 I made an RC and started a vote: https://lists.apache.org/thread/scnnmsw6g200ckjj1rgx58oj9q6xzb2h -- This is an automated message from the Apache Git Service. To respond to

[PR] Update 53 upgrade guide to note release [datafusion]

2026-04-07 Thread via GitHub
alamb opened a new pull request, #21449: URL: https://github.com/apache/datafusion/pull/21449 ## Which issue does this PR close? - Related to https://github.com/apache/datafusion/issues/19692 ## Rationale for this change @rluvaton noted some issues with the 53 upgrade gu

Re: [PR] deps: upgrade to DataFusion 53.0, Arrow to 58.1 [datafusion-comet]

2026-04-07 Thread via GitHub
mbutrovich merged PR #3629: URL: https://github.com/apache/datafusion-comet/pull/3629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] chore: DataFusion 53.0.0 [datafusion-comet]

2026-04-07 Thread via GitHub
mbutrovich closed issue #3574: chore: DataFusion 53.0.0 URL: https://github.com/apache/datafusion-comet/issues/3574 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] perf: Optimize `substr` for Utf8, LargeUtf8 [datafusion]

2026-04-07 Thread via GitHub
neilconway commented on code in PR #21366: URL: https://github.com/apache/datafusion/pull/21366#discussion_r3046323361 ## datafusion/functions/src/unicode/substr.rs: ## @@ -326,17 +325,111 @@ fn string_view_substr( } } -fn string_substr<'a, V>(string_array: V, args: &[Ar

Re: [I] Support compound field access after subscripts, e.g. payload[1].a [datafusion]

2026-04-07 Thread via GitHub
townsag commented on issue #21384: URL: https://github.com/apache/datafusion/issues/21384#issuecomment-4201906146 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] perf: optimise `first_value`, `last_value` aggregate function [datafusion]

2026-04-07 Thread via GitHub
alamb commented on PR #21383: URL: https://github.com/apache/datafusion/pull/21383#issuecomment-4201927185 Looks like there is some (reproducable) slow down on some of the null cases: ``` group main

Re: [PR] Add configurable UNION DISTINCT to FILTER rewrite optimization [datafusion]

2026-04-07 Thread via GitHub
adriangbot commented on PR #21075: URL: https://github.com/apache/datafusion/pull/21075#issuecomment-4200525894 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21075#issuecomment-4200500846) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [I] Release DataFusion `53.1.0` (minor) (Apr 2026) [datafusion]

2026-04-07 Thread via GitHub
alamb commented on issue #21079: URL: https://github.com/apache/datafusion/issues/21079#issuecomment-4201932257 Thanks @comphead -- We should also make sure that anything backported into 52.5.0 is ported to branch-53. I will do that review tomorrow and make any additional backports needed

Re: [PR] [BRANCH-52] fix: foreign inner ffi types [datafusion]

2026-04-07 Thread via GitHub
alamb merged PR #21439: URL: https://github.com/apache/datafusion/pull/21439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] [BRANCH-52] fix: foreign inner ffi types [datafusion]

2026-04-07 Thread via GitHub
alamb commented on PR #21439: URL: https://github.com/apache/datafusion/pull/21439#issuecomment-4201934829 Ok, I'll merge this one in and make a RC -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [BRANCH-52] fix: foreign inner ffi types [datafusion]

2026-04-07 Thread via GitHub
alamb commented on PR #21439: URL: https://github.com/apache/datafusion/pull/21439#issuecomment-4201935202 Thanks @timsaucer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] deps: upgrade to DataFusion 53.0, Arrow to 58.1 [datafusion-comet]

2026-04-07 Thread via GitHub
comphead commented on code in PR #3629: URL: https://github.com/apache/datafusion-comet/pull/3629#discussion_r3046115551 ## native/core/src/execution/jni_api.rs: ## @@ -393,6 +393,11 @@ fn prepare_datafusion_session_context( // register UDFs from datafusion-spark crate fn re

Re: [PR] fix: Use codepoints in `lpad`, `rpad`, `translate` [datafusion]

2026-04-07 Thread via GitHub
neilconway commented on code in PR #21405: URL: https://github.com/apache/datafusion/pull/21405#discussion_r3045972591 ## datafusion/functions/src/unicode/rpad.rs: ## @@ -492,16 +484,15 @@ where builder.append_value(""); }

Re: [PR] feat: Add Spark-compatible `encode` function to datafusion-spark [datafusion]

2026-04-07 Thread via GitHub
JeelRajodiya commented on PR #21331: URL: https://github.com/apache/datafusion/pull/21331#issuecomment-4200339441 Hey @xanderbailey, Do I need to mention the maintainers for review? I'm planning to open more PRs for implementing other functions but I'm waiting for this to get merged.

Re: [PR] perf: Optimize `split_part` for `Utf8View` [datafusion]

2026-04-07 Thread via GitHub
neilconway commented on code in PR #21420: URL: https://github.com/apache/datafusion/pull/21420#discussion_r3046987125 ## datafusion/sqllogictest/test_files/string/string_view.slt: ## @@ -954,6 +954,71 @@ SELECT arrow_typeof(split_part(arrow_cast('a.b.c', 'Utf8View'), '.', 2));

Re: [PR] [BRANCH-52] fix: foreign inner ffi types [datafusion]

2026-04-07 Thread via GitHub
alamb commented on PR #21439: URL: https://github.com/apache/datafusion/pull/21439#issuecomment-4201222097 @timsaucer is this one ready to review / merge? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Introduce Morselizer API, rewrite `ParquetOpener` to `ParquetMorselizer` [datafusion]

2026-04-07 Thread via GitHub
alamb commented on PR #21327: URL: https://github.com/apache/datafusion/pull/21327#issuecomment-4200253734 > In case it's helpful my attempt (disclaimer Codex assisted, late last night) was #21427 / [alamb#36](https://github.com/alamb/datafusion/pull/36). > > One observation is that I

Re: [I] Extend `sqllogictest` framework to uptake custom `datafusion.format.*` settings [datafusion]

2026-04-07 Thread via GitHub
erenavsarogullari commented on issue #21447: URL: https://github.com/apache/datafusion/issues/21447#issuecomment-4201610311 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Release DataFusion `53.0.0` (Feb 2026 / Mar 2026) [datafusion]

2026-04-07 Thread via GitHub
alamb commented on issue #19692: URL: https://github.com/apache/datafusion/issues/19692#issuecomment-4202286690 > FYI, the migration guide says 53.0.0 was not released yet and it miss the following breaking changes: I made a PR to fix this: - https://github.com/apache/datafusion/pu

Re: [PR] perf: optimise `first_value`, `last_value` aggregate function [datafusion]

2026-04-07 Thread via GitHub
theirix commented on PR #21383: URL: https://github.com/apache/datafusion/pull/21383#issuecomment-4202330345 > Looks like there is some (reproducable) slow down on some of the null cases: > > ``` > group main

Re: [PR] chore: fix native shuffle for batches with no columns and 0 row count [datafusion-comet]

2026-04-07 Thread via GitHub
comphead commented on PR #3858: URL: https://github.com/apache/datafusion-comet/pull/3858#issuecomment-4202337476 Closed in favor of #3893 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] chore: fix native shuffle for batches with no columns and 0 row count [datafusion-comet]

2026-04-07 Thread via GitHub
comphead closed pull request #3858: chore: fix native shuffle for batches with no columns and 0 row count URL: https://github.com/apache/datafusion-comet/pull/3858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] chore: `native_datafusion` to report scan task input metrics [datafusion-comet]

2026-04-07 Thread via GitHub
comphead merged PR #3842: URL: https://github.com/apache/datafusion-comet/pull/3842 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] `native_datafusion` doesn't report input metrics [datafusion-comet]

2026-04-07 Thread via GitHub
comphead closed issue #3735: `native_datafusion` doesn't report input metrics URL: https://github.com/apache/datafusion-comet/issues/3735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add missing Dataframe functions [datafusion-python]

2026-04-07 Thread via GitHub
timsaucer commented on PR #1472: URL: https://github.com/apache/datafusion-python/pull/1472#issuecomment-4201194397 > Related to #1340 This doesn't cover `find_qualified_cols` from [apache/datafusion#19549](https://github.com/apache/datafusion/pull/19549) (which should probably have a nice

Re: [PR] perf: Optimize `split_part` for `Utf8View` [datafusion]

2026-04-07 Thread via GitHub
neilconway commented on PR #21420: URL: https://github.com/apache/datafusion/pull/21420#issuecomment-4201221754 > It might be worth adding unit tests for sliced StringViewArray inputs (non-zero offset) and results landing exactly at the 12-byte inline/out-of-line boundary. These are the two

Re: [PR] feat: Add Spark-compatible `encode` function to datafusion-spark [datafusion]

2026-04-07 Thread via GitHub
alamb commented on PR #21331: URL: https://github.com/apache/datafusion/pull/21331#issuecomment-4201235790 Thanks @xanderbailey and @JeelRajodiya -- the PR load is pretty intense! I started the CI for this PR -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] Optimize object store accesses for the CSV scanner [datafusion]

2026-04-07 Thread via GitHub
CuteChuanChuan commented on issue #21419: URL: https://github.com/apache/datafusion/issues/21419#issuecomment-4200372756 Hi @ariel-miculas , could I give this a try? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Add ExpressionAnalyzer for pluggable expression-level statistics estimation [datafusion]

2026-04-07 Thread via GitHub
asolimando commented on code in PR #21122: URL: https://github.com/apache/datafusion/pull/21122#discussion_r3046225207 ## datafusion/core/src/physical_planner.rs: ## @@ -2898,7 +2920,11 @@ impl DefaultPhysicalPlanner { .into_iter() .map(

Re: [PR] fix: preserve subquery structure when unparsing SubqueryAlias over Ag… [datafusion]

2026-04-07 Thread via GitHub
alamb commented on PR #21099: URL: https://github.com/apache/datafusion/pull/21099#issuecomment-4202060281 I kicked off the CI tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] fix: raise AmbiguousReference error for duplicate column names in subquery [datafusion]

2026-04-07 Thread via GitHub
alamb commented on PR #21236: URL: https://github.com/apache/datafusion/pull/21236#issuecomment-4202049546 > @alamb @xudong963 I see that the test results are now available. I'm not entirely sure how to interpret them, but based on a rough comparison with the baseline branch, it appears the

Re: [PR] fix: Use codepoints in `lpad`, `rpad`, `translate` [datafusion]

2026-04-07 Thread via GitHub
neilconway commented on code in PR #21405: URL: https://github.com/apache/datafusion/pull/21405#discussion_r3046248943 ## datafusion/functions/src/unicode/lpad.rs: ## @@ -270,22 +269,19 @@ fn lpad_scalar_unicode<'a, V: StringArrayType<'a> + Copy, T: OffsetSizeTrait>( let d

Re: [PR] feat: Support Spark expression hours [datafusion-comet]

2026-04-07 Thread via GitHub
0lai0 commented on code in PR #3804: URL: https://github.com/apache/datafusion-comet/pull/3804#discussion_r3047127823 ## native/spark-expr/src/datetime_funcs/hours.rs: ## @@ -0,0 +1,299 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] perf: Optimize `substr` for Utf8, LargeUtf8 [datafusion]

2026-04-07 Thread via GitHub
neilconway commented on code in PR #21366: URL: https://github.com/apache/datafusion/pull/21366#discussion_r3046188974 ## datafusion/functions/src/unicode/substr.rs: ## @@ -326,17 +325,111 @@ fn string_view_substr( } } -fn string_substr<'a, V>(string_array: V, args: &[Ar

Re: [PR] Add configurable UNION DISTINCT to FILTER rewrite optimization [datafusion]

2026-04-07 Thread via GitHub
comphead commented on PR #21075: URL: https://github.com/apache/datafusion/pull/21075#issuecomment-4200497020 run benchmark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: Support Spark expression hours [datafusion-comet]

2026-04-07 Thread via GitHub
0lai0 commented on code in PR #3804: URL: https://github.com/apache/datafusion-comet/pull/3804#discussion_r3047127823 ## native/spark-expr/src/datetime_funcs/hours.rs: ## @@ -0,0 +1,299 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] fix: raise AmbiguousReference error for duplicate column names in subquery [datafusion]

2026-04-07 Thread via GitHub
xiedeyantu commented on PR #21236: URL: https://github.com/apache/datafusion/pull/21236#issuecomment-4202582656 > I think we need to avoid performance regressions before merging this Of course! I’ll think about other possible ways to fix this issue. Thanks for your expert feedback.

Re: [I] Poor performance of json scan for local files [datafusion]

2026-04-07 Thread via GitHub
ariel-miculas commented on issue #21450: URL: https://github.com/apache/datafusion/issues/21450#issuecomment-4202614703 It would be nice to have this first: https://github.com/apache/datafusion/issues/21446 -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] perf: optimize object store requests when reading JSON [datafusion]

2026-04-07 Thread via GitHub
ariel-miculas commented on PR #20823: URL: https://github.com/apache/datafusion/pull/20823#issuecomment-4202604980 I ran some tests with clickbench, reading from local files is worse: ``` [ec2-user@ip-172-31-0-185 datafusion]$ ./benchmarks/bench.sh compare json-test-on-main test-json-i

[I] Poor performance of json scan for local files [datafusion]

2026-04-07 Thread via GitHub
ariel-miculas opened a new issue, #21450: URL: https://github.com/apache/datafusion/issues/21450 I ran some tests with clickbench, reading from local files is worse: ``` [ec2-user@ip-172-31-0-185 datafusion]$ ./benchmarks/bench.sh compare json-test-on-main test-json-improvement Comp

Re: [PR] feat: support InSubquery and Exists in Projection expressions [datafusion]

2026-04-07 Thread via GitHub
crm26 commented on PR #21363: URL: https://github.com/apache/datafusion/pull/21363#issuecomment-4202658920 Found and fixed the CI failure, pushed as 92884480a. The `cargo test (amd64)` / `(macos-aarch64)` failures came from a stale negative test in `predicates.slt:845` that asserted `

Re: [PR] perf: optimise `first_value`, `last_value` aggregate function [datafusion]

2026-04-07 Thread via GitHub
adriangbot commented on PR #21383: URL: https://github.com/apache/datafusion/pull/21383#issuecomment-420013 🤖 Criterion benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21383#issuecomment-4200118375) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Lin

Re: [PR] test: Add `datafusion.format.*` configs test coverage [datafusion]

2026-04-07 Thread via GitHub
alamb commented on code in PR #21355: URL: https://github.com/apache/datafusion/pull/21355#discussion_r3046068833 ## datafusion/sqllogictest/test_files/set_variable.slt: ## @@ -379,6 +379,204 @@ RESET datafusion.execution.batches_size statement error DataFusion error: Invalid o

Re: [PR] [branch-52] Update version to 52.5.0 and add changelog [datafusion]

2026-04-07 Thread via GitHub
alamb commented on PR #21407: URL: https://github.com/apache/datafusion/pull/21407#issuecomment-4201996373 I merged up and updated the changelog -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] Release DataFusion 52.5.0 (minor) Release (Apr 2026) [datafusion]

2026-04-07 Thread via GitHub
alamb commented on issue #21078: URL: https://github.com/apache/datafusion/issues/21078#issuecomment-4202007656 Ok, we have the content in. I updated the version bump / changelog - https://github.com/apache/datafusion/pull/21407 Once that is good I'll move on to making a r

Re: [PR] chore: `native_datafusion` to report scan task input metrics [datafusion-comet]

2026-04-07 Thread via GitHub
comphead commented on PR #3842: URL: https://github.com/apache/datafusion-comet/pull/3842#issuecomment-4202341483 Thanks @mbutrovich for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] fix: preserve subquery structure when unparsing SubqueryAlias over Ag… [datafusion]

2026-04-07 Thread via GitHub
yonatan-sevenai commented on PR #21099: URL: https://github.com/apache/datafusion/pull/21099#issuecomment-4202190009 > I kicked off the CI tests Merged in Master to make merge easier, but you might need to rerun CI. -- This is an automated message from the Apache Git Service. To re

Re: [I] Add `datafusion.format.*` configs test coverage [datafusion]

2026-04-07 Thread via GitHub
alamb closed issue #21354: Add `datafusion.format.*` configs test coverage URL: https://github.com/apache/datafusion/issues/21354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: support InSubquery and Exists in Projection expressions [datafusion]

2026-04-07 Thread via GitHub
neilconway commented on code in PR #21363: URL: https://github.com/apache/datafusion/pull/21363#discussion_r3047256748 ## datafusion/optimizer/src/decorrelate_predicate_subquery.rs: ## @@ -69,53 +70,113 @@ impl OptimizerRule for DecorrelatePredicateSubquery { })?

Re: [PR] Reapply "Fix/support duplicate column names" [datafusion]

2026-04-07 Thread via GitHub
alamb commented on PR #21260: URL: https://github.com/apache/datafusion/pull/21260#issuecomment-4201251616 > Good news on step 3 — the Omega359 fork features were already upstreamed to risinglightdb/sqllogictest-rs in v0.24.0 ([PR #237](https://github.com/risinglightdb/sqllogictest-rs/pull/

Re: [PR] Blog: Row-Level DML in DataFusion [datafusion-site]

2026-04-07 Thread via GitHub
alamb commented on PR #136: URL: https://github.com/apache/datafusion-site/pull/136#issuecomment-4201243286 Oh no, I just got this ping. Are you still interested in working on this PR @ethan-tyler ? If so I will review it. I am sorry that i lost track of it -- This is an automated message

Re: [PR] perf: optimise `first_value`, `last_value` aggregate function [datafusion]

2026-04-07 Thread via GitHub
adriangbot commented on PR #21383: URL: https://github.com/apache/datafusion/pull/21383#issuecomment-4200252248 🤖 Criterion benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21383#issuecomment-4200118375) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)

Re: [PR] deps: upgrade to DataFusion 53.0, Arrow to 58.1 [datafusion-comet]

2026-04-07 Thread via GitHub
comphead commented on code in PR #3629: URL: https://github.com/apache/datafusion-comet/pull/3629#discussion_r3046143486 ## native/core/src/parquet/schema_adapter.rs: ## @@ -314,19 +328,40 @@ impl SparkPhysicalExprAdapter { .find(|f| f.name().eq_ignore_a

Re: [PR] feat: support InSubquery and Exists in Projection expressions [datafusion]

2026-04-07 Thread via GitHub
crm26 commented on PR #21363: URL: https://github.com/apache/datafusion/pull/21363#issuecomment-4201754197 Thanks for the thorough review, @neilconway! Addressed all three points in the latest push (commit 12a12bcd1): 1. **Optimization: prefix** — Added to the alias preservation comme

Re: [PR] [branch-52] chore: update deps for cargo audit [datafusion]

2026-04-07 Thread via GitHub
alamb commented on PR #21415: URL: https://github.com/apache/datafusion/pull/21415#issuecomment-4199079517 Thank you @xudong963 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] feat: sort file groups by statistics during sort pushdown (Sort pushdown phase 2) [datafusion]

2026-04-07 Thread via GitHub
adriangb commented on PR #21182: URL: https://github.com/apache/datafusion/pull/21182#issuecomment-4199080230 certainly worthy of a blog post! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [branch-52] chore: update deps for cargo audit [datafusion]

2026-04-07 Thread via GitHub
alamb merged PR #21415: URL: https://github.com/apache/datafusion/pull/21415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] [branch-52] Update version to 52.5.0 and add changelog [datafusion]

2026-04-07 Thread via GitHub
alamb commented on PR #21407: URL: https://github.com/apache/datafusion/pull/21407#issuecomment-4199077554 Thanks @xudong963 -- I will wait for all the rest of the content to get into branch-52 and then refresh the changelog and merge this PR -- This is an automated message from the Apac

Re: [PR] feat: add is_nullable scalar UDF [datafusion]

2026-04-07 Thread via GitHub
martin-g commented on PR #21387: URL: https://github.com/apache/datafusion/pull/21387#issuecomment-4199118424 > "use arrow_field(...)['nullable']" I like it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Skip per-row filter evaluation when all row groups are fully matched [datafusion]

2026-04-07 Thread via GitHub
adriangb commented on PR #21372: URL: https://github.com/apache/datafusion/pull/21372#issuecomment-4199116661 This makes a lot of sense to me. We have a hacky version of this internally, it's especially effective for filters/queries like `ts > '2026-04-05T00:15:00Z' where many files will ha

  1   2   3   >