Re: [PR] feat: support Spark-compatible `json_tuple` function [datafusion]

2026-02-20 Thread via GitHub
CuteChuanChuan commented on code in PR #20412: URL: https://github.com/apache/datafusion/pull/20412#discussion_r2832379459 ## datafusion/spark/src/function/json/json_tuple.rs: ## @@ -0,0 +1,244 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] feat: support Spark-compatible `json_tuple` function [datafusion]

2026-02-20 Thread via GitHub
Jefffrey commented on code in PR #20412: URL: https://github.com/apache/datafusion/pull/20412#discussion_r2832364193 ## datafusion/spark/src/function/json/json_tuple.rs: ## @@ -0,0 +1,244 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] test: add sqllogictest coverage for UDWF return types in information_… [datafusion]

2026-02-20 Thread via GitHub
Jefffrey commented on code in PR #20098: URL: https://github.com/apache/datafusion/pull/20098#discussion_r2832354449 ## datafusion/sqllogictest/test_files/information_schema.slt: ## @@ -812,6 +812,49 @@ select is_deterministic from information_schema.routines where routine_name

Re: [PR] feat: introduce hadoop mini cluster to test native scan on hdfs [datafusion-comet]

2026-02-20 Thread via GitHub
ariel-miculas commented on code in PR #1556: URL: https://github.com/apache/datafusion-comet/pull/1556#discussion_r2832154248 ## native/core/Cargo.toml: ## @@ -68,7 +68,7 @@ datafusion-comet-proto = { workspace = true } object_store = { workspace = true } url = { workspace = t

Re: [PR] Add SETOF support for PostgreSQL function return types [datafusion-sqlparser-rs]

2026-02-20 Thread via GitHub
iffyio commented on code in PR #2217: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2217#discussion_r2832162422 ## src/ast/spans.rs: ## @@ -632,33 +657,6 @@ impl Spanned for TableConstraint { } } -impl Spanned for PartitionBoundValue { Review Comment:

Re: [PR] build: update Rust toolchain version from 1.92.0 to 1.93.0 in `rust-toolchain.toml` [datafusion]

2026-02-20 Thread via GitHub
Jefffrey commented on PR #20309: URL: https://github.com/apache/datafusion/pull/20309#issuecomment-3932582606 Thanks @dariocurr, @neilconway & @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] build: update Rust toolchain version from 1.92.0 to 1.93.0 in `rust-toolchain.toml` [datafusion]

2026-02-20 Thread via GitHub
Jefffrey merged PR #20309: URL: https://github.com/apache/datafusion/pull/20309 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: support Spark-compatible `json_tuple` function [datafusion]

2026-02-20 Thread via GitHub
CuteChuanChuan commented on code in PR #20412: URL: https://github.com/apache/datafusion/pull/20412#discussion_r2832197621 ## datafusion/spark/src/function/json/json_tuple.rs: ## @@ -0,0 +1,255 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] feat: support Spark-compatible `json_tuple` function [datafusion]

2026-02-20 Thread via GitHub
CuteChuanChuan commented on code in PR #20412: URL: https://github.com/apache/datafusion/pull/20412#discussion_r2832198239 ## datafusion/spark/src/function/json/json_tuple.rs: ## @@ -0,0 +1,255 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] feat: support Spark-compatible `json_tuple` function [datafusion]

2026-02-20 Thread via GitHub
CuteChuanChuan commented on code in PR #20412: URL: https://github.com/apache/datafusion/pull/20412#discussion_r2832196468 ## datafusion/spark/src/function/json/json_tuple.rs: ## @@ -0,0 +1,255 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [I] Release DataFusion 52.2.0 (minor/) Release (Feb 2026) [datafusion]

2026-02-20 Thread via GitHub
jackkleeman commented on issue #20287: URL: https://github.com/apache/datafusion/issues/20287#issuecomment-3932611289 https://github.com/apache/datafusion/issues/20445 I think this is a candidate of something we need to fix in 52.2 -- This is an automated message from the Apache Git Servi

Re: [PR] chore(deps): bump tonic-prost-build from 0.14.4 to 0.14.5 [datafusion-ballista]

2026-02-20 Thread via GitHub
milenkovicm merged PR #1468: URL: https://github.com/apache/datafusion-ballista/pull/1468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] [Oracle] Table alias for INSERTed table [datafusion-sqlparser-rs]

2026-02-20 Thread via GitHub
iffyio commented on code in PR #2214: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2214#discussion_r2832228786 ## src/parser/mod.rs: ## @@ -17166,12 +17166,27 @@ impl<'a> Parser<'a> { let table = self.parse_keyword(Keyword::TABLE); let t

Re: [PR] feat: support Spark-compatible `json_tuple` function [datafusion]

2026-02-20 Thread via GitHub
CuteChuanChuan commented on PR #20412: URL: https://github.com/apache/datafusion/pull/20412#issuecomment-3932628385 Hi @comphead and @Jefffrey , Appreciate for the review. I added more edge cases, revise the places pointed out. PTAL when you have a chance. Thanks. -- This is an auto

Re: [PR] Upgrade to sqlparser 0.61.0 [datafusion]

2026-02-20 Thread via GitHub
Jefffrey commented on code in PR #20177: URL: https://github.com/apache/datafusion/pull/20177#discussion_r2832072122 ## datafusion/sql/src/statement.rs: ## @@ -1421,6 +1441,9 @@ impl SqlToRel<'_, S> { if on_cluster.is_some() { return not_imp

[I] [Bug] [v52 regression] Panic in `GroupOrderingPartial::remove_groups` when Partial aggregate with PartiallySorted hits memory pressure [datafusion]

2026-02-20 Thread via GitHub
jackkleeman opened a new issue, #20445: URL: https://github.com/apache/datafusion/issues/20445 ### Describe the bug Partial aggregation with `PartiallySorted` ordering panics when memory pressure triggers the `EmitEarly` OOM path: ``` assertion failed: *current_sort >= n

[PR] Clamp early aggregation emit to the sort boundary when using partial group ordering [datafusion]

2026-02-20 Thread via GitHub
jackkleeman opened a new pull request, #20446: URL: https://github.com/apache/datafusion/pull/20446 ## Which issue does this PR close? - Closes #20445. ## What changes are included in this PR? Fix a panic on early emit with partial sort aggregations, by clamping o

Re: [PR] chore(deps): bump tonic-build from 0.14.4 to 0.14.5 [datafusion-ballista]

2026-02-20 Thread via GitHub
milenkovicm merged PR #1469: URL: https://github.com/apache/datafusion-ballista/pull/1469 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [I] Optimize trim UDFs for single-character trim patterns [datafusion]

2026-02-20 Thread via GitHub
Jefffrey closed issue #20327: Optimize trim UDFs for single-character trim patterns URL: https://github.com/apache/datafusion/issues/20327 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] perf: Optimize trim UDFs for single-character trims [datafusion]

2026-02-20 Thread via GitHub
Jefffrey commented on PR #20328: URL: https://github.com/apache/datafusion/pull/20328#issuecomment-3932646537 Thanks @neilconway & @martin-g -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] perf: Optimize trim UDFs for single-character trims [datafusion]

2026-02-20 Thread via GitHub
Jefffrey merged PR #20328: URL: https://github.com/apache/datafusion/pull/20328 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: support `arrays_zip` function [datafusion]

2026-02-20 Thread via GitHub
Jefffrey commented on code in PR #20440: URL: https://github.com/apache/datafusion/pull/20440#discussion_r2832258368 ## datafusion/functions-nested/src/set_ops.rs: ## @@ -64,6 +65,13 @@ make_udf_expr_and_func!( array_distinct_udf ); +make_udf_expr_and_func!( Review Comm

Re: [PR] bench: Add dynamic IN list benchmarks for non-constant list expressions [datafusion]

2026-02-20 Thread via GitHub
adriangb commented on code in PR #20444: URL: https://github.com/apache/datafusion/pull/20444#discussion_r2832276223 ## datafusion/physical-expr/benches/in_list.rs: ## @@ -50,7 +51,9 @@ fn random_string(rng: &mut StdRng, len: usize) -> String { } const IN_LIST_LENGTHS: [usiz

Re: [PR] perf: Optimize scalar fast path for `regexp_like` and rejects g inside combined flags like ig [datafusion]

2026-02-20 Thread via GitHub
Jefffrey commented on code in PR #20354: URL: https://github.com/apache/datafusion/pull/20354#discussion_r2832316597 ## datafusion/functions/src/regex/regexplike.rs: ## @@ -314,6 +328,89 @@ pub fn regexp_like(args: &[ArrayRef]) -> Result { } } +fn scalar_string(value: &S

Re: [PR] [Oracle] Table alias for INSERTed table [datafusion-sqlparser-rs]

2026-02-20 Thread via GitHub
xitep commented on code in PR #2214: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2214#discussion_r2832516436 ## src/parser/mod.rs: ## @@ -17166,12 +17166,27 @@ impl<'a> Parser<'a> { let table = self.parse_keyword(Keyword::TABLE); let ta

Re: [PR] [Oracle] Table alias for INSERTed table [datafusion-sqlparser-rs]

2026-02-20 Thread via GitHub
xitep commented on code in PR #2214: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2214#discussion_r2832516436 ## src/parser/mod.rs: ## @@ -17166,12 +17166,27 @@ impl<'a> Parser<'a> { let table = self.parse_keyword(Keyword::TABLE); let ta

Re: [PR] [Oracle] Table alias for INSERTed table [datafusion-sqlparser-rs]

2026-02-20 Thread via GitHub
xitep commented on code in PR #2214: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2214#discussion_r2832516436 ## src/parser/mod.rs: ## @@ -17166,12 +17166,27 @@ impl<'a> Parser<'a> { let table = self.parse_keyword(Keyword::TABLE); let ta

Re: [PR] [Oracle] Table alias for INSERTed table [datafusion-sqlparser-rs]

2026-02-20 Thread via GitHub
xitep commented on code in PR #2214: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2214#discussion_r2832516436 ## src/parser/mod.rs: ## @@ -17166,12 +17166,27 @@ impl<'a> Parser<'a> { let table = self.parse_keyword(Keyword::TABLE); let ta

Re: [PR] [Oracle] Table alias for INSERTed table [datafusion-sqlparser-rs]

2026-02-20 Thread via GitHub
xitep commented on code in PR #2214: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2214#discussion_r2832516436 ## src/parser/mod.rs: ## @@ -17166,12 +17166,27 @@ impl<'a> Parser<'a> { let table = self.parse_keyword(Keyword::TABLE); let ta

Re: [PR] refactor: Extract sort-merge join filter logic into separate module [datafusion]

2026-02-20 Thread via GitHub
viirya commented on PR #19614: URL: https://github.com/apache/datafusion/pull/19614#issuecomment-3937630035 Thank you @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] chore: Consolidate TPC benchmark scripts [datafusion-comet]

2026-02-20 Thread via GitHub
andygrove merged PR #3538: URL: https://github.com/apache/datafusion-comet/pull/3538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: Consolidate TPC benchmark scripts [datafusion-comet]

2026-02-20 Thread via GitHub
andygrove commented on PR #3538: URL: https://github.com/apache/datafusion-comet/pull/3538#issuecomment-3937669041 Thanks @comphead @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] chore: Cleanup "!is_valid(i)" -> "is_null(i)" [datafusion]

2026-02-20 Thread via GitHub
Jefffrey commented on PR #20453: URL: https://github.com/apache/datafusion/pull/20453#issuecomment-3937927924 Thanks @neilconway & @comphead, nice cleanup -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] chore: Cleanup "!is_valid(i)" -> "is_null(i)" [datafusion]

2026-02-20 Thread via GitHub
Jefffrey merged PR #20453: URL: https://github.com/apache/datafusion/pull/20453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: Implement Spark `bitmap_bucket_number` function [datafusion]

2026-02-20 Thread via GitHub
Jefffrey commented on PR #20288: URL: https://github.com/apache/datafusion/pull/20288#issuecomment-3937933402 Thanks @kazantsev-maksim -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] feat: Implement Spark `bitmap_bucket_number` function [datafusion]

2026-02-20 Thread via GitHub
Jefffrey merged PR #20288: URL: https://github.com/apache/datafusion/pull/20288 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] refactor: Extract sort-merge join filter logic into separate module [datafusion]

2026-02-20 Thread via GitHub
viirya merged PR #19614: URL: https://github.com/apache/datafusion/pull/19614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [PR] Chore: Code hygiene - warn-numeric-widen [datafusion-comet]

2026-02-20 Thread via GitHub
github-actions[bot] closed pull request #2588: Chore: Code hygiene - warn-numeric-widen URL: https://github.com/apache/datafusion-comet/pull/2588 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: Throws an exception when struct type has duplicate keys [datafusion-comet]

2026-02-20 Thread via GitHub
github-actions[bot] closed pull request #2459: fix: Throws an exception when struct type has duplicate keys URL: https://github.com/apache/datafusion-comet/pull/2459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Chore: Fix Scala code warnings - Spark module [datafusion-comet]

2026-02-20 Thread via GitHub
github-actions[bot] closed pull request #2558: Chore: Fix Scala code warnings - Spark module URL: https://github.com/apache/datafusion-comet/pull/2558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] fix: Support scalar/array args for rpad/read_side_padding [datafusion-comet]

2026-02-20 Thread via GitHub
github-actions[bot] closed pull request #2482: fix: Support scalar/array args for rpad/read_side_padding URL: https://github.com/apache/datafusion-comet/pull/2482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] perf: Add ReflectionCache for Iceberg serialization optimization [iceberg] [datafusion-comet]

2026-02-20 Thread via GitHub
Shekharrajak commented on code in PR #3558: URL: https://github.com/apache/datafusion-comet/pull/3558#discussion_r2835867389 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometIcebergSerializationBenchmark.scala: ## @@ -0,0 +1,302 @@ +/* + * Licensed to the Apache Softw

Re: [PR] perf: Add ReflectionCache for Iceberg serialization optimization [iceberg] [datafusion-comet]

2026-02-20 Thread via GitHub
Shekharrajak commented on code in PR #3558: URL: https://github.com/apache/datafusion-comet/pull/3558#discussion_r2835870939 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometIcebergSerializationBenchmark.scala: ## @@ -0,0 +1,302 @@ +/* + * Licensed to the Apache Softw

Re: [PR] [RFC] Add lambda support and array_transform udf [datafusion]

2026-02-20 Thread via GitHub
gstvg commented on PR #18921: URL: https://github.com/apache/datafusion/pull/18921#issuecomment-3938399669 Thanks @linhr! It most applies to having lambdas and args partitioned, omitting the body on `TreeNode` and removing `Expr::Lambda`. Changing the PR to just use a new `Expr::LambdaFunct

[PR] chore: Add TPC-* queries to repo [datafusion-comet]

2026-02-20 Thread via GitHub
andygrove opened a new pull request, #3562: URL: https://github.com/apache/datafusion-comet/pull/3562 ## Which issue does this PR close? N/A ## Rationale for this change The benchmark scripts in `benchmarks/tpc` currently require the user to provide the q

Re: [PR] perf: Switch to a channel instead of yield_now() on Pending during executePlan [iceberg] [datafusion-comet]

2026-02-20 Thread via GitHub
mbutrovich commented on PR #3553: URL: https://github.com/apache/datafusion-comet/pull/3553#issuecomment-3933648140 My biggest concern with this design is related to thread-local data. Queries are now running on tokio workers instead of the executor task threads, so any access to thread-lo

Re: [I] make_array coercion failure when argument is of type struct [datafusion]

2026-02-20 Thread via GitHub
Mark1626 closed issue #20429: make_array coercion failure when argument is of type struct URL: https://github.com/apache/datafusion/issues/20429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] make_array coercion failure when argument is of type struct [datafusion]

2026-02-20 Thread via GitHub
Mark1626 commented on issue #20429: URL: https://github.com/apache/datafusion/issues/20429#issuecomment-3932981760 My bad, this has to come from the user. The Databricks query uses the `struct` function which adds `col1`, col2`, when `named_struct` is used the same error as Datafusion is th

Re: [PR] chore(deps): bump tonic-prost from 0.14.4 to 0.14.5 [datafusion-ballista]

2026-02-20 Thread via GitHub
milenkovicm merged PR #1467: URL: https://github.com/apache/datafusion-ballista/pull/1467 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] perf: reduce read amplification for partitioned JSON file scanning [datafusion]

2026-02-20 Thread via GitHub
ariel-miculas commented on code in PR #19687: URL: https://github.com/apache/datafusion/pull/19687#discussion_r2832884751 ## datafusion/datasource-json/src/source.rs: ## @@ -188,23 +187,59 @@ impl FileOpener for JsonOpener { let file_compression_type = self.file_compres

Re: [PR] fix(substrait): Correctly parse field references in subqueries [datafusion]

2026-02-20 Thread via GitHub
neilconway commented on PR #20439: URL: https://github.com/apache/datafusion/pull/20439#issuecomment-3935506259 Hi @gabotechs @waynexia Would you have time to take a quick look at this when you get a chance? Thank you! -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Extend dynamic filter pushdown to Left and LeftSemi hash joins [datafusion]

2026-02-20 Thread via GitHub
helgikrs commented on code in PR #20447: URL: https://github.com/apache/datafusion/pull/20447#discussion_r2833732075 ## datafusion/physical-plan/src/joins/hash_join/exec.rs: ## @@ -738,7 +738,7 @@ impl HashJoinExec { } fn allow_join_dynamic_filter_pushdown(&self, con

[PR] Add statistics-based guards to SortMergeJoin-to-HashJoin rewrite [datafusion-comet]

2026-02-20 Thread via GitHub
andygrove opened a new pull request, #3554: URL: https://github.com/apache/datafusion-comet/pull/3554 ## Summary - Add per-partition size check and size ratio check to `RewriteJoin`, mirroring Spark's own `JoinSelection` logic (`canBuildLocalHashMapBySize()` and `muchSmaller()`) -

Re: [I] Support filter pushdown through `SortMergeJoinExec` [datafusion]

2026-02-20 Thread via GitHub
petern48 commented on issue #20443: URL: https://github.com/apache/datafusion/issues/20443#issuecomment-3935598399 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] perf: executePlan uses a channel to park executor task thread instead of yield_now() on Pending [iceberg] [datafusion-comet]

2026-02-20 Thread via GitHub
mbutrovich commented on PR #3553: URL: https://github.com/apache/datafusion-comet/pull/3553#issuecomment-3933775855 I updated the development guide with design considerations for thread-local data and JNI in this architecture. I will try to get more benchmarking results today. -- This i

Re: [PR] Migrate Python usage to uv workspace [datafusion]

2026-02-20 Thread via GitHub
adriangb commented on PR #20414: URL: https://github.com/apache/datafusion/pull/20414#issuecomment-3934863737 > > From the PR description it isn't clear to me if you've run all of those commands to verify they work as expected. Maybe just update the description if they've all been manually

Re: [PR] Migrate Python usage to uv workspace [datafusion]

2026-02-20 Thread via GitHub
adriangb commented on code in PR #20414: URL: https://github.com/apache/datafusion/pull/20414#discussion_r2833303783 ## docs/source/user-guide/example-usage.md: ## @@ -29,7 +29,7 @@ Find latest available Datafusion version on [DataFusion's crates.io] page. Add the dependency to

Re: [I] More type checking at logical planning [datafusion]

2026-02-20 Thread via GitHub
Acfboy commented on issue #20356: URL: https://github.com/apache/datafusion/issues/20356#issuecomment-3935416550 > I would like to work on this by adding early validation for these clauses, following the same error handling pattern used for SELECT expressions via Expr::to_field.

Re: [PR] perf: Optimize `array_has_any()` with scalar arg [datafusion]

2026-02-20 Thread via GitHub
neilconway commented on code in PR #20385: URL: https://github.com/apache/datafusion/pull/20385#discussion_r2833632994 ## datafusion/functions-nested/src/array_has.rs: ## @@ -476,6 +483,179 @@ fn array_has_any_inner(args: &[ArrayRef]) -> Result { array_has_all_and_any_inne

Re: [PR] Add support for FFI config extensions [datafusion]

2026-02-20 Thread via GitHub
davisp commented on code in PR #19469: URL: https://github.com/apache/datafusion/pull/19469#discussion_r2833637773 ## datafusion/ffi/src/config/mod.rs: ## @@ -0,0 +1,169 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] Allow custom OptimizerHints [datafusion-sqlparser-rs]

2026-02-20 Thread via GitHub
iffyio commented on code in PR #2216: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2216#discussion_r2833593678 ## src/parser/mod.rs: ## @@ -14168,53 +14168,75 @@ impl<'a> Parser<'a> { }) } -/// Parses an optional optimizer hint at the current

Re: [PR] PostgreSQL: Support more COMMENT ON object types [datafusion-sqlparser-rs]

2026-02-20 Thread via GitHub
iffyio merged PR #2220: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2220 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] PostgreSQL: Support more COMMENT ON object types [datafusion-sqlparser-rs]

2026-02-20 Thread via GitHub
guan404ming commented on PR #2220: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2220#issuecomment-3935496236 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] chore(deps): bump tonic from 0.14.4 to 0.14.5 [datafusion-ballista]

2026-02-20 Thread via GitHub
milenkovicm merged PR #1466: URL: https://github.com/apache/datafusion-ballista/pull/1466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] ci: run TPC-H benchmarks on a Kind Kubernetes cluster [datafusion-comet]

2026-02-20 Thread via GitHub
Shekharrajak commented on PR #3549: URL: https://github.com/apache/datafusion-comet/pull/3549#issuecomment-3933349318 > The benchmarks do run in k8s already, but using local mode rather than truly distributed. I am planning on making that change, and I also need to align this with the benc

[PR] feat: implement PhysicalOptimizerRule in FFI crate [datafusion]

2026-02-20 Thread via GitHub
timsaucer opened a new pull request, #20451: URL: https://github.com/apache/datafusion/pull/20451 DRAFT: Requires rebase after https://github.com/apache/datafusion/pull/20449 merges ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/pull/19469

Re: [I] Expose `PhysicalOptimizerRule` via FFI [datafusion]

2026-02-20 Thread via GitHub
timsaucer commented on issue #20450: URL: https://github.com/apache/datafusion/issues/20450#issuecomment-3934356516 FYI @gabotechs @robtandy this work epic is what will enable using `datafusion-distributed` with `datafusion-python`. Of course I have PRs for that repo, but this is the requir

Re: [PR] ci: run TPC-H benchmarks on a Kind Kubernetes cluster [datafusion-comet]

2026-02-20 Thread via GitHub
mbutrovich closed pull request #3549: ci: run TPC-H benchmarks on a Kind Kubernetes cluster URL: https://github.com/apache/datafusion-comet/pull/3549 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] perf: executePlan uses a channel to park executor task thread instead of yield_now() [iceberg] [datafusion-comet]

2026-02-20 Thread via GitHub
mbutrovich commented on PR #3553: URL: https://github.com/apache/datafusion-comet/pull/3553#issuecomment-3934473137 @sqlbenchmark run tpch --iterations 3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[PR] Add workflow to verify release candidate on multiple systems [datafusion-python]

2026-02-20 Thread via GitHub
timsaucer opened a new pull request, #1388: URL: https://github.com/apache/datafusion-python/pull/1388 # Which issue does this PR close? None # Rationale for this change Thank you to @kevinjqliu for this suggestion to include as part of our workflow # What change

Re: [PR] Add workflow to verify release candidate on multiple systems [datafusion-python]

2026-02-20 Thread via GitHub
timsaucer commented on PR #1388: URL: https://github.com/apache/datafusion-python/pull/1388#issuecomment-3934504844 Excellent idea @kevinjqliu ! I think there are two changes we should make: - [ ] Add note in the file describing that it is a manually run workflow as a hint for future

Re: [PR] perf: Optimize `array_has_any()` with scalar arg [datafusion]

2026-02-20 Thread via GitHub
neilconway commented on code in PR #20385: URL: https://github.com/apache/datafusion/pull/20385#discussion_r2833497170 ## datafusion/functions-nested/src/array_has.rs: ## @@ -476,6 +483,179 @@ fn array_has_any_inner(args: &[ArrayRef]) -> Result { array_has_all_and_any_inne

Re: [PR] Prefer use of `peek_token_ref` over `peek_token` where valid [datafusion-sqlparser-rs]

2026-02-20 Thread via GitHub
iffyio merged PR #2225: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] perf: executePlan uses a channel to park executor task thread instead of yield_now() [iceberg] [datafusion-comet]

2026-02-20 Thread via GitHub
mbutrovich merged PR #3553: URL: https://github.com/apache/datafusion-comet/pull/3553 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] [Oracle] Table alias for INSERTed table [datafusion-sqlparser-rs]

2026-02-20 Thread via GitHub
iffyio commented on code in PR #2214: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2214#discussion_r2833532056 ## src/parser/mod.rs: ## @@ -17166,12 +17166,27 @@ impl<'a> Parser<'a> { let table = self.parse_keyword(Keyword::TABLE); let t

Re: [PR] bench: Add dynamic IN list benchmarks for non-constant list expressions [datafusion]

2026-02-20 Thread via GitHub
zhangx commented on code in PR #20444: URL: https://github.com/apache/datafusion/pull/20444#discussion_r2833543369 ## datafusion/physical-expr/benches/in_list.rs: ## @@ -50,7 +51,9 @@ fn random_string(rng: &mut StdRng, len: usize) -> String { } const IN_LIST_LENGTHS: [us

Re: [PR] bench: Add dynamic IN list benchmarks for non-constant list expressions [datafusion]

2026-02-20 Thread via GitHub
zhangx commented on code in PR #20444: URL: https://github.com/apache/datafusion/pull/20444#discussion_r2833555998 ## datafusion/physical-expr/benches/in_list.rs: ## @@ -219,6 +222,144 @@ fn bench_realistic_mixed_strings( } } +/// Benchmarks the dynamic evaluation pa

Re: [PR] bench: Add dynamic IN list benchmarks for non-constant list expressions [datafusion]

2026-02-20 Thread via GitHub
adriangb commented on PR #20444: URL: https://github.com/apache/datafusion/pull/20444#issuecomment-3935317555 run benchmark in_list -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] bench: Add dynamic IN list benchmarks for non-constant list expressions [datafusion]

2026-02-20 Thread via GitHub
zhangx commented on code in PR #20444: URL: https://github.com/apache/datafusion/pull/20444#discussion_r2833559732 ## datafusion/physical-expr/benches/in_list.rs: ## @@ -219,6 +222,144 @@ fn bench_realistic_mixed_strings( } } +/// Benchmarks the dynamic evaluation pa

Re: [PR] perf: Optimize `array_has_any()` with scalar arg [datafusion]

2026-02-20 Thread via GitHub
neilconway commented on code in PR #20385: URL: https://github.com/apache/datafusion/pull/20385#discussion_r2833570925 ## datafusion/functions-nested/src/array_has.rs: ## @@ -476,6 +483,179 @@ fn array_has_any_inner(args: &[ArrayRef]) -> Result { array_has_all_and_any_inne

Re: [PR] Extend dynamic filter pushdown to Left and LeftSemi hash joins [datafusion]

2026-02-20 Thread via GitHub
getChan commented on code in PR #20447: URL: https://github.com/apache/datafusion/pull/20447#discussion_r2833577082 ## datafusion/physical-plan/src/joins/hash_join/exec.rs: ## @@ -738,7 +738,7 @@ impl HashJoinExec { } fn allow_join_dynamic_filter_pushdown(&self, conf

[PR] chore: Cleanup "!is_valid(i)" -> "is_null(i)" [datafusion]

2026-02-20 Thread via GitHub
neilconway opened a new pull request, #20453: URL: https://github.com/apache/datafusion/pull/20453 ## Which issue does this PR close? N/A ## Rationale for this change This makes the code easier to read; per suggestion from @Jefffrey in code review for a different change.

Re: [PR] chore: Cleanup "!is_valid(i)" -> "is_null(i)" [datafusion]

2026-02-20 Thread via GitHub
neilconway commented on PR #20453: URL: https://github.com/apache/datafusion/pull/20453#issuecomment-3935764125 cc @Jefffrey -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Add workflow to verify release candidate on multiple systems [datafusion-python]

2026-02-20 Thread via GitHub
kevinjqliu commented on PR #1388: URL: https://github.com/apache/datafusion-python/pull/1388#issuecomment-3935767056 @timsaucer Could you update the PR description with something like: ``` This PR adds a manually triggered GitHub Actions workflow to verify release candidates acro

Re: [PR] Add workflow to verify release candidate on multiple systems [datafusion-python]

2026-02-20 Thread via GitHub
kevinjqliu commented on PR #1388: URL: https://github.com/apache/datafusion-python/pull/1388#issuecomment-3935709635 I can make the changes above to the release and verify process. Just a note from ASF perspective; It is allowed to verify releases with cloud machines, but must creat

[PR] perf: Pre-resolve type dispatch in sort-merge join comparators [datafusion]

2026-02-20 Thread via GitHub
andygrove opened a new pull request, #20452: URL: https://github.com/apache/datafusion/pull/20452 ## Summary - Replace per-row runtime `DataType` matching in `is_join_arrays_equal()` and `compare_join_arrays()` with a `JoinComparator` struct that resolves typed comparison function po

Re: [PR] Hash join buffering on probe side [datafusion]

2026-02-20 Thread via GitHub
gabotechs commented on PR #19761: URL: https://github.com/apache/datafusion/pull/19761#issuecomment-3935748767 > Would it make sense to detect an empty build side right after collect_build_side completes, and for join types where empty build --> empty output , drop the probe stream immediat

Re: [PR] Allow custom OptimizerHints [datafusion-sqlparser-rs]

2026-02-20 Thread via GitHub
iffyio merged PR #2216: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2216 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add workflow to verify release candidate on multiple systems [datafusion-python]

2026-02-20 Thread via GitHub
kevinjqliu commented on PR #1388: URL: https://github.com/apache/datafusion-python/pull/1388#issuecomment-3935775713 As a follow up, I can take a look at how to enable this for Windows (I think it requires a few minor changes to `dev/release/verify-release-candidate.sh`) If this is h

Re: [PR] feat: Implement Spark `bitmap_bucket_number` function [datafusion]

2026-02-20 Thread via GitHub
kazantsev-maksim commented on PR #20288: URL: https://github.com/apache/datafusion/pull/20288#issuecomment-3935833990 Thanks @Jefffrey, resolved conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] perf: Optimize `array_has_any()` with scalar arg [datafusion]

2026-02-20 Thread via GitHub
neilconway commented on PR #20385: URL: https://github.com/apache/datafusion/pull/20385#issuecomment-3935833873 @Jefffrey Thank you for the detailed code review! 🙏 I addressed all of your comments; please let me know if you have more feedback. -- This is an automated message from the Apac

Re: [PR] fix: prevent duplicate alias collision with user-provided __datafusion_extracted names [datafusion]

2026-02-20 Thread via GitHub
cetra3 commented on code in PR #20432: URL: https://github.com/apache/datafusion/pull/20432#discussion_r2834000856 ## datafusion/optimizer/src/extract_leaf_expressions.rs: ## @@ -127,10 +123,45 @@ impl OptimizerRule for ExtractLeafExpressions { return Ok(Transformed

Re: [PR] Migrate Python usage to uv workspace [datafusion]

2026-02-20 Thread via GitHub
adriangb merged PR #20414: URL: https://github.com/apache/datafusion/pull/20414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Migrate Python usage to uv workspace [datafusion]

2026-02-20 Thread via GitHub
adriangb commented on PR #20414: URL: https://github.com/apache/datafusion/pull/20414#issuecomment-3935894591 Thank you, Tim! I’ll send this to merge and if any issues pop up or there are ergonomic improvement I will handle as a follow up -- This is an automated message from the Apache Gi

Re: [I] ClickHouse: Parsing error for WITH statement using scalar expressions [datafusion-sqlparser-rs]

2026-02-20 Thread via GitHub
alrevuelta commented on issue #2221: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2221#issuecomment-3935923384 also interested in this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Fix: map_from_arrays() with NULL inputs causes native crash [datafusion-comet]

2026-02-20 Thread via GitHub
kazantsev-maksim commented on code in PR #3356: URL: https://github.com/apache/datafusion-comet/pull/3356#discussion_r2834081766 ## spark/src/test/resources/sql-tests/expressions/map/map_from_arrays.sql: ## @@ -26,9 +26,7 @@ INSERT INTO test_map_from_arrays VALUES (array('a', 'b

Re: [PR] perf: Fix quadratic behavior of `to_array_of_size` [datafusion]

2026-02-20 Thread via GitHub
neilconway commented on PR #20459: URL: https://github.com/apache/datafusion/pull/20459#issuecomment-3937883176 I see #18159 already exists for this issue; I'll be optimistic and claim this PR closes it... 😅 -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] perf: Fix quadratic behavior of `to_array_of_size` [datafusion]

2026-02-20 Thread via GitHub
neilconway commented on PR #20459: URL: https://github.com/apache/datafusion/pull/20459#issuecomment-3937887152 We could consider backing out the special-case logic in NLJ that was introduced in #18161, but that will require some consideration and benchmarking first. -- This is an automa

Re: [I] Support filter pushdown through `SortMergeJoinExec` [datafusion]

2026-02-20 Thread via GitHub
petern48 commented on issue #20443: URL: https://github.com/apache/datafusion/issues/20443#issuecomment-3938028946 @mdashti No worries at all, I hadn't started yet anyway. Happy to leave it for you to tackle. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] Add details for dropping qualified columns [datafusion-python]

2026-02-20 Thread via GitHub
Prathamesh9284 commented on issue #1340: URL: https://github.com/apache/datafusion-python/issues/1340#issuecomment-3937954783 Yes. I’ll pick this up once 53 is available in core. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

  1   2   >