Re: [PR] feat: implement partition_statistics for HashJoinExec [datafusion]

2026-02-07 Thread via GitHub
0xPoe commented on PR #16956: URL: https://github.com/apache/datafusion/pull/16956#issuecomment-3864349100 > Hello, is this pull request still being worked on? I would be happy to take over if busy Aside from the merge conflicts, I believe the PR mainly awaits review and feedback.

Re: [PR] IN LIST optims [datafusion]

2026-02-07 Thread via GitHub
geoffreyclaude commented on PR #19390: URL: https://github.com/apache/datafusion/pull/19390#issuecomment-3864123396 run benchmark in_list -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] IN LIST optims [datafusion]

2026-02-07 Thread via GitHub
alamb-ghbot commented on PR #19390: URL: https://github.com/apache/datafusion/pull/19390#issuecomment-3864123463 🤖 `./gh_compare_branch_bench.sh` [compare_branch_bench.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/compare_branch_bench.sh) Running Linux aal-dev 6.

Re: [I] [COMET NATIVE WRITER] INSERT INTO TABLE - complex type but different names [datafusion-comet]

2026-02-07 Thread via GitHub
ShivamSoni20 commented on issue #3426: URL: https://github.com/apache/datafusion-comet/issues/3426#issuecomment-3864122731 hey @coderfender I am ready for the pr please assign me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] feat: add support for `GetTimestamp`, `parse_to_date`, `parse_to_timestamp` expressions [datafusion-comet]

2026-02-07 Thread via GitHub
rafafrdz opened a new pull request, #3438: URL: https://github.com/apache/datafusion-comet/pull/3438 # Summary - Add native Comet support for Spark's `GetTimestamp` expression, which allow the followings, - Add native Comet support for Spark's `ParseToDate` expression, and therefo

Re: [PR] feat: add support for `date_to_parse` expression [datafusion-comet]

2026-02-07 Thread via GitHub
rafafrdz closed pull request #3267: feat: add support for `date_to_parse` expression URL: https://github.com/apache/datafusion-comet/pull/3267 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[PR] chore(deps): bump time from 0.3.46 to 0.3.47 [datafusion-ballista]

2026-02-07 Thread via GitHub
dependabot[bot] opened a new pull request, #1446: URL: https://github.com/apache/datafusion-ballista/pull/1446 Bumps [time](https://github.com/time-rs/time) from 0.3.46 to 0.3.47. Release notes Sourced from https://github.com/time-rs/time/releases";>time's releases. v0.3.47

[PR] chore: add confirmation before tarball is released [datafusion-comet]

2026-02-07 Thread via GitHub
milenkovicm opened a new pull request, #3439: URL: https://github.com/apache/datafusion-comet/pull/3439 ## Which issue does this PR close? Closes #. ## Rationale for this change Add confirmation dialogue to confirm tarball release, as it could be triggered by acciden

Re: [PR] refactor: Change TableScan.projection from indices to expressions [datafusion]

2026-02-07 Thread via GitHub
alamb commented on code in PR #20091: URL: https://github.com/apache/datafusion/pull/20091#discussion_r2777470270 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -2682,8 +2688,9 @@ pub struct TableScan { pub table_name: TableReference, /// The source of the table

Re: [PR] fix: Fix panic in regexp_like() [datafusion]

2026-02-07 Thread via GitHub
neilconway commented on PR #20200: URL: https://github.com/apache/datafusion/pull/20200#issuecomment-3865650975 > Can we perhaps add the regression test to .slt? @alamb Sure, done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Add `StructArray` and `RunArray` benchmark tests to `with_hashes` [datafusion]

2026-02-07 Thread via GitHub
notashes commented on code in PR #20182: URL: https://github.com/apache/datafusion/pull/20182#discussion_r2778163891 ## datafusion/common/benches/with_hashes.rs: ## @@ -68,11 +71,25 @@ fn criterion_benchmark(c: &mut Criterion) { name: "dictionary_utf8_int32",

Re: [PR] Add `StructArray` and `RunArray` benchmark tests to `with_hashes` [datafusion]

2026-02-07 Thread via GitHub
notashes commented on code in PR #20182: URL: https://github.com/apache/datafusion/pull/20182#discussion_r2778164988 ## datafusion/common/benches/with_hashes.rs: ## @@ -205,5 +222,123 @@ where Arc::new(array) } +/// Create a StructArray with multiple columns +fn create_s

Re: [PR] Add `StructArray` and `RunArray` benchmark tests to `with_hashes` [datafusion]

2026-02-07 Thread via GitHub
notashes commented on code in PR #20182: URL: https://github.com/apache/datafusion/pull/20182#discussion_r2778165967 ## datafusion/common/benches/with_hashes.rs: ## @@ -205,5 +222,123 @@ where Arc::new(array) } +/// Create a StructArray with multiple columns +fn create_s

[PR] enable dynamic filtering for file hash partitioned data [datafusion]

2026-02-07 Thread via GitHub
gene-bordegaray opened a new pull request, #20217: URL: https://github.com/apache/datafusion/pull/20217 ## Which issue does this PR close? - Closes #20195. ## Rationale for this change ## What changes are included in this PR? ## Are these ch

[PR] fix: Avoid assertion failure on divide-by-zero [datafusion]

2026-02-07 Thread via GitHub
neilconway opened a new pull request, #20216: URL: https://github.com/apache/datafusion/pull/20216 A WHERE clause like `4==(3/0)` will not be optimized away, but will result in `context.selectivity` being `None`. ## Which issue does this PR close? - Closes #20215 ## Are

Re: [PR] fix: Fix panic in regexp_like() [datafusion]

2026-02-07 Thread via GitHub
alamb commented on PR #20200: URL: https://github.com/apache/datafusion/pull/20200#issuecomment-3865511221 Can we perhaps add the regression test to .slt? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] unify the prettier versions [datafusion]

2026-02-07 Thread via GitHub
alamb merged PR #20167: URL: https://github.com/apache/datafusion/pull/20167 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Enable `clone_on_ref_ptr` Clippy lint for the whole workspace [datafusion]

2026-02-07 Thread via GitHub
un1u3 commented on issue #17083: URL: https://github.com/apache/datafusion/issues/17083#issuecomment-3865883444 > I would like to take a look at this issue. @un1u3 is that ok ? Go on -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[PR] chore: Unbreak doctest CI [datafusion]

2026-02-07 Thread via GitHub
neilconway opened a new pull request, #20218: URL: https://github.com/apache/datafusion/pull/20218 File was renamed and split into smaller files as part of #20183. ## Which issue does this PR close? ## Rationale for this change CI is failing with: ```

Re: [PR] perf: Add batch coalescing in BufBatchWriter to reduce IPC schema overhead [datafusion-comet]

2026-02-07 Thread via GitHub
andygrove commented on PR #3441: URL: https://github.com/apache/datafusion-comet/pull/3441#issuecomment-3865888369 @EmilyMatt fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] chore: Unbreak doctest CI [datafusion]

2026-02-07 Thread via GitHub
neilconway commented on PR #20218: URL: https://github.com/apache/datafusion/pull/20218#issuecomment-3865894950 Note that there are still various references to a file called `upgrading.md` in the source tree; I haven't attempted to fix all those as part of this PR. ``` $ ag upgradi

Re: [PR] chore: Unbreak doctest CI [datafusion]

2026-02-07 Thread via GitHub
neilconway commented on PR #20218: URL: https://github.com/apache/datafusion/pull/20218#issuecomment-3865895816 PTAL @avantgardnerio @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[I] Dynamic filter applied to the wrong table when using subqueries [datafusion]

2026-02-07 Thread via GitHub
nuno-faria opened a new issue, #20213: URL: https://github.com/apache/datafusion/issues/20213 ### Describe the bug When a subquery (with an inner join) is used on a join, the dynamic filter generated by the external join can be incorrectly pushed to both tables of the nested one, if

Re: [I] Break upgrade guides into separate pages [datafusion]

2026-02-07 Thread via GitHub
avantgardnerio closed issue #20155: Break upgrade guides into separate pages URL: https://github.com/apache/datafusion/issues/20155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Break upgrade guides into separate pages [datafusion]

2026-02-07 Thread via GitHub
avantgardnerio merged PR #20183: URL: https://github.com/apache/datafusion/pull/20183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Fix name tracker [datafusion]

2026-02-07 Thread via GitHub
xanderbailey commented on PR #19856: URL: https://github.com/apache/datafusion/pull/19856#issuecomment-3865137118 Thanks for the review, I have a couple of failing test cases here that I need to look into. Will take a look on Monday and report back. -- This is an automated message from t

Re: [I] Unpin infra repo commit hash in `Makefile` [datafusion-site]

2026-02-07 Thread via GitHub
Abhinandankaushik commented on issue #144: URL: https://github.com/apache/datafusion-site/issues/144#issuecomment-3865374288 hey @Jefffrey i will raise a pr for this issue very soon but before that there is need to merge this pr #148 so that i will not need to merge this branch furt

Re: [PR] feat: implement cast from whole numbers to binary format and bool to decimal [datafusion-comet]

2026-02-07 Thread via GitHub
coderfender commented on PR #3083: URL: https://github.com/apache/datafusion-comet/pull/3083#issuecomment-3865687904 Rebased with main and fixed merged conflicts -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[PR] Add batch coalescing in BufBatchWriter to reduce IPC schema overhead [datafusion-comet]

2026-02-07 Thread via GitHub
andygrove opened a new pull request, #3441: URL: https://github.com/apache/datafusion-comet/pull/3441 ## Which issue does this PR close? Closes #. ## Rationale for this change In the multi-partition shuffle path, each small batch becomes its own IPC block with fu

[PR] build(deps): bump arrow-select from 57.2.0 to 57.3.0 [datafusion-python]

2026-02-07 Thread via GitHub
dependabot[bot] opened a new pull request, #1373: URL: https://github.com/apache/datafusion-python/pull/1373 Bumps [arrow-select](https://github.com/apache/arrow-rs) from 57.2.0 to 57.3.0. Release notes Sourced from https://github.com/apache/arrow-rs/releases";>arrow-select's rele

Re: [I] Enable `clone_on_ref_ptr` Clippy lint for the whole workspace [datafusion]

2026-02-07 Thread via GitHub
JoshElkind commented on issue #17083: URL: https://github.com/apache/datafusion/issues/17083#issuecomment-3865185173 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Enable `clone_on_ref_ptr` Clippy lint for the whole workspace [datafusion]

2026-02-07 Thread via GitHub
JoshElkind commented on issue #17083: URL: https://github.com/apache/datafusion/issues/17083#issuecomment-3865186177 I would like to take a look at this issue. @un1u3 is that ok ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[I] Proposal: Port most microbenchmarks to PySpark [datafusion-comet]

2026-02-07 Thread via GitHub
andygrove opened a new issue, #3440: URL: https://github.com/apache/datafusion-comet/issues/3440 ### What is the problem the feature request solves? We have some very useful microbenchmarks implemented in Scala and tightly integrated with Comet. I would like to propose moving most of

Re: [PR] Break upgrade guides into separate pages [datafusion]

2026-02-07 Thread via GitHub
mishop-15 commented on PR #20183: URL: https://github.com/apache/datafusion/pull/20183#issuecomment-3865127405 > Definitely an improvement, thank you! I can click into each guide and do `ctrl-f` just as I hoped! > > Unfortunately when I click on "Upgrade Guides", I see this: >

Re: [PR] Break upgrade guides into separate pages [datafusion]

2026-02-07 Thread via GitHub
avantgardnerio commented on PR #20183: URL: https://github.com/apache/datafusion/pull/20183#issuecomment-3865135636 > added the version numbers. Fantastic, thank you @mishop-15 ! I really appreciate your help! I've clicked "merge when ready" so I think it should be part of `main` shor

[PR] build(deps): bump arrow from 57.2.0 to 57.3.0 [datafusion-python]

2026-02-07 Thread via GitHub
dependabot[bot] opened a new pull request, #1374: URL: https://github.com/apache/datafusion-python/pull/1374 Bumps [arrow](https://github.com/apache/arrow-rs) from 57.2.0 to 57.3.0. Release notes Sourced from https://github.com/apache/arrow-rs/releases";>arrow's releases. ar

[I] Enable dynamic filters for nested joins [datafusion]

2026-02-07 Thread via GitHub
nuno-faria opened a new issue, #20214: URL: https://github.com/apache/datafusion/issues/20214 ### Is your feature request related to a problem or challenge? I would like to enable dynamic filter pushdown for inner joins which use other joins as subqueries. For example: ```sql

Re: [PR] feat: Support planning subqueries with OuterReferenceColumn belongs to non-adjacent outer relations [datafusion]

2026-02-07 Thread via GitHub
mkleen commented on PR #19930: URL: https://github.com/apache/datafusion/pull/19930#issuecomment-3865155312 @alamb @duongcongtoai Could you please do one more review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[I] Assert failure on trivial WHERE with divide-by-zero [datafusion]

2026-02-07 Thread via GitHub
neilconway opened a new issue, #20215: URL: https://github.com/apache/datafusion/issues/20215 ### Describe the bug A trivially unsatisfiable WHERE clause with a divide-by-zero can result in `context.sensitivity` being `None`, which fails an assert in physical expr analysis. ##

Re: [PR] Add `StructArray` and `RunArray` benchmark tests to `with_hashes` [datafusion]

2026-02-07 Thread via GitHub
notashes commented on code in PR #20182: URL: https://github.com/apache/datafusion/pull/20182#discussion_r2778165967 ## datafusion/common/benches/with_hashes.rs: ## @@ -205,5 +222,123 @@ where Arc::new(array) } +/// Create a StructArray with multiple columns +fn create_s

Re: [I] Unpin infra repo commit hash in `Makefile` [datafusion-site]

2026-02-07 Thread via GitHub
Jefffrey commented on issue #144: URL: https://github.com/apache/datafusion-site/issues/144#issuecomment-3865971376 I don't see why this issue should be blocked by #148; they are entirely separate. I would recommend working on this issue on a separate branch. -- This is an automated mess

[I] date_bin() panics on large inputs [datafusion]

2026-02-07 Thread via GitHub
neilconway opened a new issue, #20219: URL: https://github.com/apache/datafusion/issues/20219 ### Describe the bug Found via fuzzing. ### To Reproduce ``` select DATE_BIN('1637426858', TO_TIMESTAMP_MILLIS(1040292460), TIMESTAMP '1984-01-07 00:00:00'); ``` Yi

[I] LIKE fails on nested value [datafusion]

2026-02-07 Thread via GitHub
neilconway opened a new issue, #20210: URL: https://github.com/apache/datafusion/issues/20210 ### Describe the bug `datafusion_physical_expr_common::datum::apply_cmp` does not handle LikeMatch, ILikeMatch, etc for nested data types. ### To Reproduce ``` CREATE TABLE t

Re: [PR] feat: Support planning subqueries with OuterReferenceColumn belongs to non-adjacent outer relations [datafusion]

2026-02-07 Thread via GitHub
mkleen commented on code in PR #19930: URL: https://github.com/apache/datafusion/pull/19930#discussion_r220721 ## datafusion/sql/tests/sql_integration.rs: ## @@ -995,15 +995,15 @@ fn select_nested_with_filters() { #[test] fn table_with_column_alias() { -let sql = "SE

Re: [PR] feat: Optimize hash util for `MapArray` [datafusion]

2026-02-07 Thread via GitHub
jonathanc-n commented on code in PR #20179: URL: https://github.com/apache/datafusion/pull/20179#discussion_r2777810372 ## datafusion/common/src/hash_utils.rs: ## @@ -630,6 +679,69 @@ fn hash_union_array( Ok(()) } +/// Hash a sparse union array. +/// Sparse unions have c

Re: [PR] feat: Optimize hash util for `MapArray` [datafusion]

2026-02-07 Thread via GitHub
jonathanc-n commented on code in PR #20179: URL: https://github.com/apache/datafusion/pull/20179#discussion_r2777829578 ## datafusion/common/src/hash_utils.rs: ## @@ -630,6 +679,69 @@ fn hash_union_array( Ok(()) } +/// Hash a sparse union array. +/// Sparse unions have c

Re: [PR] Adds support for ANSI mode in negative function [datafusion]

2026-02-07 Thread via GitHub
comphead commented on code in PR #20189: URL: https://github.com/apache/datafusion/pull/20189#discussion_r2777859495 ## datafusion/spark/src/function/math/negative.rs: ## @@ -96,37 +95,80 @@ impl ScalarUDFImpl for SparkNegative { } fn invoke_with_args(&self, args: Sc

Re: [PR] Adds support for ANSI mode in negative function [datafusion]

2026-02-07 Thread via GitHub
comphead commented on code in PR #20189: URL: https://github.com/apache/datafusion/pull/20189#discussion_r2777860472 ## datafusion/spark/src/function/math/negative.rs: ## @@ -147,56 +189,154 @@ fn spark_negative(args: &[ColumnarValue]) -> Result { Ok(ColumnarVa

Re: [PR] feat: Optimize hash util for `MapArray` [datafusion]

2026-02-07 Thread via GitHub
jonathanc-n commented on code in PR #20179: URL: https://github.com/apache/datafusion/pull/20179#discussion_r2777805353 ## datafusion/common/src/hash_utils.rs: ## @@ -481,23 +483,39 @@ fn hash_map_array( let offsets = array.offsets(); // Create hashes for each entry

Re: [PR] feat: Optimize hash util for `MapArray` [datafusion]

2026-02-07 Thread via GitHub
jonathanc-n commented on code in PR #20179: URL: https://github.com/apache/datafusion/pull/20179#discussion_r2777808340 ## datafusion/common/src/hash_utils.rs: ## @@ -481,23 +483,39 @@ fn hash_map_array( let offsets = array.offsets(); // Create hashes for each entry

Re: [PR] feat: Optimize hash util for `MapArray` [datafusion]

2026-02-07 Thread via GitHub
jonathanc-n commented on code in PR #20179: URL: https://github.com/apache/datafusion/pull/20179#discussion_r2777827880 ## datafusion/common/src/hash_utils.rs: ## @@ -630,6 +679,69 @@ fn hash_union_array( Ok(()) } +/// Hash a sparse union array. +/// Sparse unions have c

[I] Potential Optimizations for `hash_union_array` [datafusion]

2026-02-07 Thread via GitHub
jonathanc-n opened a new issue, #20211: URL: https://github.com/apache/datafusion/issues/20211 ### Is your feature request related to a problem or challenge? https://github.com/apache/datafusion/pull/20179#discussion_r250677 @Jefffrey Has some good ideas to consider for opti

Re: [PR] Break upgrade guides into separate pages [datafusion]

2026-02-07 Thread via GitHub
avantgardnerio commented on PR #20183: URL: https://github.com/apache/datafusion/pull/20183#issuecomment-3865027877 Definitely an improvement, thank you! I can click into each guide and do `ctrl-f` just as I hoped! Unfortunately when I click on "Upgrade Guides", I see this: htt

Re: [I] LIKE fails on nested value [datafusion]

2026-02-07 Thread via GitHub
Tushar7012 commented on issue #20210: URL: https://github.com/apache/datafusion/issues/20210#issuecomment-3865033889 Hi @neilconway , I looked into the issue and the failure seems to come from how apply_cmp currently assumes scalar (flat) inputs when handling operators like LIKE, NOT LI

[PR] fix: Throw coercion error for `LIKE` operations for nested types. [datafusion]

2026-02-07 Thread via GitHub
jonathanc-n opened a new pull request, #20212: URL: https://github.com/apache/datafusion/pull/20212 ## Which issue does this PR close? - Closes #20210. ## Rationale for this change Throw coercion error for LIKE adjacent operations. This matches DuckDB behaviour, just

Re: [PR] feat: add support for `GetTimestamp`, `parse_to_date`, `parse_to_timestamp` expressions [datafusion-comet]

2026-02-07 Thread via GitHub
coderfender commented on code in PR #3438: URL: https://github.com/apache/datafusion-comet/pull/3438#discussion_r2777908482 ## native/spark-expr/Cargo.toml: ## @@ -30,6 +30,7 @@ edition = { workspace = true } arrow = { workspace = true } chrono = { workspace = true } datafusi

Re: [PR] feat: add support for `GetTimestamp`, `parse_to_date`, `parse_to_timestamp` expressions [datafusion-comet]

2026-02-07 Thread via GitHub
coderfender commented on code in PR #3438: URL: https://github.com/apache/datafusion-comet/pull/3438#discussion_r2777908828 ## native/Cargo.toml: ## @@ -36,9 +36,10 @@ rust-version = "1.88" [workspace.dependencies] arrow = { version = "57.2.0", features = ["prettyprint", "ffi"

Re: [PR] feat: add support for `GetTimestamp`, `parse_to_date`, `parse_to_timestamp` expressions [datafusion-comet]

2026-02-07 Thread via GitHub
coderfender commented on PR #3438: URL: https://github.com/apache/datafusion-comet/pull/3438#issuecomment-3865053126 @rafafrdz , Thank you for working on this and welcome to the datafusion community :) . I would suggest breaking down this PR by feature to make it easier for reviewers (an

Re: [I] LIKE fails on nested value [datafusion]

2026-02-07 Thread via GitHub
neilconway commented on issue #20210: URL: https://github.com/apache/datafusion/issues/20210#issuecomment-3865056172 @Tushar7012 Makes sense to me! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] IN LIST optims [datafusion]

2026-02-07 Thread via GitHub
alamb-ghbot commented on PR #19390: URL: https://github.com/apache/datafusion/pull/19390#issuecomment-3864210250 🤖: Benchmark completed Details ``` group main perf_in_list_optim

[PR] fix: add parentheses to SchemaDisplay/SqlDisplay for BinaryExpr [datafusion]

2026-02-07 Thread via GitHub
AndreaBozzo opened a new pull request, #20206: URL: https://github.com/apache/datafusion/pull/20206 ## Which issue does this PR close? Closes #16054 ## Rationale for this change Expression formatting like `(1+2)*3` displays as `Int64(1) + Int64(2) * Int64(3)`, losing par

Re: [PR] Optimize Clickbench Query 29 by adding a new Optimizer rule [datafusion]

2026-02-07 Thread via GitHub
alamb-ghbot commented on PR #20180: URL: https://github.com/apache/datafusion/pull/20180#issuecomment-3864623663 🤖: Benchmark completed Details ``` Comparing HEAD and query_29 Benchmark clickbench_extended.json ┏━━

Re: [PR] Caching the /target to avoid recompilation [datafusion]

2026-02-07 Thread via GitHub
Suryansh-Dey commented on PR #20186: URL: https://github.com/apache/datafusion/pull/20186#issuecomment-3864638175 Here is the Comparison # Without cache https://github.com/user-attachments/assets/3e5d8e6b-97dd-4afe-b61c-43a912910109"; /> [source](https://github.com/Suryansh-Dey

Re: [PR] fix: Avoid integer overflow in substr() [datafusion]

2026-02-07 Thread via GitHub
alamb commented on code in PR #20199: URL: https://github.com/apache/datafusion/pull/20199#discussion_r2777503268 ## datafusion/functions/src/unicode/substr.rs: ## @@ -247,19 +250,19 @@ pub fn enable_ascii_fast_path<'a, V: StringArrayType<'a>>( // HACK: can be sim

Re: [PR] Optimize Clickbench Query 29 by adding a new Optimizer rule [datafusion]

2026-02-07 Thread via GitHub
alamb commented on PR #20180: URL: https://github.com/apache/datafusion/pull/20180#issuecomment-3864463393 run benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Optimize Clickbench Query 29 by adding a new Optimizer rule [datafusion]

2026-02-07 Thread via GitHub
alamb-ghbot commented on PR #20180: URL: https://github.com/apache/datafusion/pull/20180#issuecomment-3864463357 🤖 `./gh_compare_branch_bench.sh` [compare_branch_bench.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/compare_branch_bench.sh) Running Linux aal-dev 6.

Re: [PR] Optimize Clickbench Query 29 by adding a new Optimizer rule [datafusion]

2026-02-07 Thread via GitHub
alamb commented on PR #20180: URL: https://github.com/apache/datafusion/pull/20180#issuecomment-3864463154 run benchmark sql_planner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Optimize Clickbench Query 29 by adding a new Optimizer rule [datafusion]

2026-02-07 Thread via GitHub
alamb commented on PR #20180: URL: https://github.com/apache/datafusion/pull/20180#issuecomment-3864463741 Thanks @devanshu0987 -- I kicked off some benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] fix: Avoid integer overflow in split_part() [datafusion]

2026-02-07 Thread via GitHub
alamb merged PR #20198: URL: https://github.com/apache/datafusion/pull/20198 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Break upgrade guides into separate pages [datafusion]

2026-02-07 Thread via GitHub
alamb commented on PR #20183: URL: https://github.com/apache/datafusion/pull/20183#issuecomment-3864467459 CI still seems to fail -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] InList support for pre-image udf [datafusion]

2026-02-07 Thread via GitHub
alamb closed issue #20050: InList support for pre-image udf URL: https://github.com/apache/datafusion/issues/20050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [I] Avoid recompute CTEs (common table expressions) / share input plans [datafusion]

2026-02-07 Thread via GitHub
alamb commented on issue #8777: URL: https://github.com/apache/datafusion/issues/8777#issuecomment-3864514391 > This is a great solution. We cannot enable it if there's a join. Why not? Can you provide an example of what you are thinking of here? Is this like if the join probe

Re: [PR] fix(datafusion-cli): solve row count bug adding`saturating_add` to prevent potential overflow [datafusion]

2026-02-07 Thread via GitHub
alamb commented on PR #20185: URL: https://github.com/apache/datafusion/pull/20185#issuecomment-3864464850 Thanks @dariocurr and @Jefffrey -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Enable inlist support for preimage [datafusion]

2026-02-07 Thread via GitHub
alamb merged PR #20051: URL: https://github.com/apache/datafusion/pull/20051 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Caching the /target to avoid recompilation [datafusion]

2026-02-07 Thread via GitHub
alamb commented on PR #20186: URL: https://github.com/apache/datafusion/pull/20186#issuecomment-3864459741 Thank you @Suryansh-Dey Can you please measure the impact this change has on build times (with links to the jobs where you measured)? We have found in the past that these

Re: [PR] feat: support limited deletion [datafusion]

2026-02-07 Thread via GitHub
alamb merged PR #20137: URL: https://github.com/apache/datafusion/pull/20137 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix(datafusion-cli): solve row count bug adding`saturating_add` to prevent potential overflow [datafusion]

2026-02-07 Thread via GitHub
alamb merged PR #20185: URL: https://github.com/apache/datafusion/pull/20185 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Updated `parse_infix(..)` in `mysql.rs` and `sqlite.rs` to handle error rather than `unwrap()` [datafusion-sqlparser-rs]

2026-02-07 Thread via GitHub
RPG-Alex commented on PR #2207: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2207#issuecomment-3864485145 I realized the `sqlite.rs` `parse_infix()` was already patched. My PR only applies to `mysql.rs` now. -- This is an automated message from the Apache Git Service. To

Re: [PR] feat: Add selectivity-tracking wrapper for dynamic filters [datafusion]

2026-02-07 Thread via GitHub
adriangb commented on PR #20160: URL: https://github.com/apache/datafusion/pull/20160#issuecomment-3864495397 @Dandandan mind giving this a look and maybe running benchmarks locally to see if you can repro the difference with CI? -- This is an automated message from the Apache Git Service

Re: [PR] Optimize Clickbench Query 29 by adding a new Optimizer rule [datafusion]

2026-02-07 Thread via GitHub
alamb-ghbot commented on PR #20180: URL: https://github.com/apache/datafusion/pull/20180#issuecomment-3864551854 🤖: Benchmark completed Details ``` group main query_29 -

Re: [PR] Optimize Clickbench Query 29 by adding a new Optimizer rule [datafusion]

2026-02-07 Thread via GitHub
alamb-ghbot commented on PR #20180: URL: https://github.com/apache/datafusion/pull/20180#issuecomment-3864551995 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] [TESTING] Test parquet filter pushdown with mask backed row selection [datafusion]

2026-02-07 Thread via GitHub
alamb commented on PR #19301: URL: https://github.com/apache/datafusion/pull/19301#issuecomment-3865077014 Let's continue at https://github.com/apache/datafusion/pull/19477 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [TESTING] Test parquet filter pushdown with mask backed row selection [datafusion]

2026-02-07 Thread via GitHub
alamb closed pull request #19301: [TESTING] Test parquet filter pushdown with mask backed row selection URL: https://github.com/apache/datafusion/pull/19301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] LIKE fails on nested value [datafusion]

2026-02-07 Thread via GitHub
Tushar7012 commented on issue #20210: URL: https://github.com/apache/datafusion/issues/20210#issuecomment-3865076640 Just having a PR for this Issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] chore: Unbreak doctest CI [datafusion]

2026-02-07 Thread via GitHub
Jefffrey merged PR #20218: URL: https://github.com/apache/datafusion/pull/20218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: Unbreak doctest CI [datafusion]

2026-02-07 Thread via GitHub
Jefffrey commented on PR #20218: URL: https://github.com/apache/datafusion/pull/20218#issuecomment-3865998037 Thanks @neilconway, merged to unbreak CI > Note that there are still various references to a file called `upgrading.md` in the source tree; I haven't attempted to fix all thos

Re: [PR] fix: Throw coercion error for `LIKE` operations for nested types. [datafusion]

2026-02-07 Thread via GitHub
Jefffrey commented on code in PR #20212: URL: https://github.com/apache/datafusion/pull/20212#discussion_r2778416458 ## datafusion/sqllogictest/test_files/type_coercion.slt: ## @@ -254,3 +254,30 @@ DROP TABLE orders; ## Test type coerci

Re: [PR] Add `StructArray` and `RunArray` benchmark tests to `with_hashes` [datafusion]

2026-02-07 Thread via GitHub
Jefffrey commented on code in PR #20182: URL: https://github.com/apache/datafusion/pull/20182#discussion_r2778425679 ## datafusion/common/benches/with_hashes.rs: ## @@ -47,50 +51,75 @@ fn criterion_benchmark(c: &mut Criterion) { BenchData { name: "int64",

Re: [PR] fix: percentile_cont interpolation causes NaN for f16 input [datafusion]

2026-02-07 Thread via GitHub
Jefffrey commented on code in PR #20208: URL: https://github.com/apache/datafusion/pull/20208#discussion_r2778423798 ## datafusion/functions-aggregate/src/percentile_cont.rs: ## @@ -58,17 +58,50 @@ use datafusion_macros::user_doc; use crate::utils::validate_percentile_expr;

Re: [PR] perf: Optimize scalar fast path for nanvl [datafusion]

2026-02-07 Thread via GitHub
Jefffrey commented on code in PR #20205: URL: https://github.com/apache/datafusion/pull/20205#discussion_r2778411209 ## datafusion/functions/src/math/nanvl.rs: ## @@ -101,7 +104,53 @@ impl ScalarUDFImpl for NanvlFunc { } fn invoke_with_args(&self, args: ScalarFunctio

Re: [PR] Better document the relationship between `FileFormat::projection` / `FileFormat::filter` and `FileScanConfig::Statistics` [datafusion]

2026-02-07 Thread via GitHub
zhuqi-lucas commented on code in PR #20188: URL: https://github.com/apache/datafusion/pull/20188#discussion_r2778691172 ## datafusion/datasource/src/file.rs: ## @@ -46,6 +46,12 @@ pub fn as_file_source(source: T) -> Arc /// file format specific behaviors for elements in [`Da

Re: [PR] Add `StructArray` and `RunArray` benchmark tests to `with_hashes` [datafusion]

2026-02-07 Thread via GitHub
notashes commented on code in PR #20182: URL: https://github.com/apache/datafusion/pull/20182#discussion_r2778645410 ## datafusion/common/benches/with_hashes.rs: ## @@ -47,50 +51,75 @@ fn criterion_benchmark(c: &mut Criterion) { BenchData { name: "int64",

Re: [PR] Add `StructArray` and `RunArray` benchmark tests to `with_hashes` [datafusion]

2026-02-07 Thread via GitHub
notashes commented on code in PR #20182: URL: https://github.com/apache/datafusion/pull/20182#discussion_r2778645410 ## datafusion/common/benches/with_hashes.rs: ## @@ -47,50 +51,75 @@ fn criterion_benchmark(c: &mut Criterion) { BenchData { name: "int64",

Re: [I] Unpin infra repo commit hash in `Makefile` [datafusion-site]

2026-02-07 Thread via GitHub
Abhinandankaushik commented on issue #144: URL: https://github.com/apache/datafusion-site/issues/144#issuecomment-3866382844 I understand I will surely work on this issue as seprate branch, Thank's -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] feat: optimize CASE WHEN for divide-by-zero protection pattern [datafusion]

2026-02-07 Thread via GitHub
alamb commented on PR #19994: URL: https://github.com/apache/datafusion/pull/19994#issuecomment-3864368279 Thanks again @pepijnve and @CuteChuanChuan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] fix: add parentheses to SchemaDisplay/SqlDisplay for BinaryExpr [datafusion]

2026-02-07 Thread via GitHub
AndreaBozzo commented on PR #20206: URL: https://github.com/apache/datafusion/pull/20206#issuecomment-3864365156 cc @Jefffrey -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[PR] Updated `parse_infix(..)` in `mysql.rs` and `sqlite.rs` to handle error rather than `unwrap()` [datafusion-sqlparser-rs]

2026-02-07 Thread via GitHub
RPG-Alex opened a new pull request, #2207: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2207 Previously for both `sqlite` and `mysql` the `parse_infix` would panic if passed `parser: &mut crate::parser::Parser` returned an error: ```rust fn parse_infix( &s

[PR] chore: Add confirmation before tarball is released [datafusion]

2026-02-07 Thread via GitHub
milenkovicm opened a new pull request, #20207: URL: https://github.com/apache/datafusion/pull/20207 ## Which issue does this PR close? - Closes #. ## Rationale for this change Add confirmation dialogue to confirm tarball release ## What changes are included in thi

[PR] chore: add confirmation before tarball is released [datafusion-ballista]

2026-02-07 Thread via GitHub
milenkovicm opened a new pull request, #1445: URL: https://github.com/apache/datafusion-ballista/pull/1445 # Which issue does this PR close? Closes #. # Rationale for this change Add confirmation dialogue to confirm tarball release # What changes are included in t

[PR] chore: add confirmation before tarball is released [datafusion-sqlparser-rs]

2026-02-07 Thread via GitHub
milenkovicm opened a new pull request, #2208: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2208 Update tarball-release.sh to ask y/N confirmation before it proceeds to release upload -- This is an automated message from the Apache Git Service. To respond to the message, ple

  1   2   >