Re: [PR] feat: placeholders in execution plans [datafusion]

2026-02-13 Thread via GitHub
LLDay commented on code in PR #20169: URL: https://github.com/apache/datafusion/pull/20169#discussion_r2803520870 ## datafusion/physical-expr/src/expressions/placeholder.rs: ## @@ -0,0 +1,126 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [PR] feat: placeholders in execution plans [datafusion]

2026-02-13 Thread via GitHub
LLDay commented on code in PR #20169: URL: https://github.com/apache/datafusion/pull/20169#discussion_r2803522046 ## datafusion/physical-expr/src/planner.rs: ## @@ -288,16 +288,28 @@ pub fn create_physical_expr( }; Ok(expressions::case(expr, when_th

Re: [PR] CI: build and run sqllogictests binary directly in extended workflow [datafusion]

2026-02-13 Thread via GitHub
kosiew commented on code in PR #20282: URL: https://github.com/apache/datafusion/pull/20282#discussion_r2803523268 ## .github/workflows/extended.yml: ## @@ -167,11 +167,19 @@ jobs: uses: ./.github/actions/setup-builder with: rust-version: stable +

Re: [PR] Allow custom OptimizerHints [datafusion-sqlparser-rs]

2026-02-13 Thread via GitHub
xitep commented on PR #2216: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2216#issuecomment-3896662765 i agree it will make the maintenance of `sqlparser` simpler by pushing the responsibilities downstream. and yes, in this case, with the optimizer hints, it is straightforwa

Re: [PR] CI: build and run sqllogictests binary directly in extended workflow [datafusion]

2026-02-13 Thread via GitHub
kosiew commented on code in PR #20282: URL: https://github.com/apache/datafusion/pull/20282#discussion_r2803523268 ## .github/workflows/extended.yml: ## @@ -167,11 +167,19 @@ jobs: uses: ./.github/actions/setup-builder with: rust-version: stable +

Re: [PR] CI: build and run sqllogictests binary directly in extended workflow [datafusion]

2026-02-13 Thread via GitHub
kosiew commented on code in PR #20282: URL: https://github.com/apache/datafusion/pull/20282#discussion_r2803992077 ## .github/workflows/extended.yml: ## @@ -167,11 +167,19 @@ jobs: uses: ./.github/actions/setup-builder with: rust-version: stable +

Re: [PR] WIP: Upgrade DataFusion to arrow-rs/parquet 58.0.0 / `object_store` 13.0.0 [datafusion]

2026-02-13 Thread via GitHub
Dandandan commented on PR #19728: URL: https://github.com/apache/datafusion/pull/19728#issuecomment-3895683330 run benchmark tpch tpcds -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] WIP: Upgrade DataFusion to arrow-rs/parquet 58.0.0 / `object_store` 13.0.0 [datafusion]

2026-02-13 Thread via GitHub
alamb-ghbot commented on PR #19728: URL: https://github.com/apache/datafusion/pull/19728#issuecomment-3895731808 Benchmark script failed with exit code 101. Last 10 lines of output: Click to expand ``` CARGO_COMMAND: cargo run --release PREFER_HASH_JOIN: true ***

Re: [PR] WIP: Upgrade DataFusion to arrow-rs/parquet 58.0.0 / `object_store` 13.0.0 [datafusion]

2026-02-13 Thread via GitHub
alamb-ghbot commented on PR #19728: URL: https://github.com/apache/datafusion/pull/19728#issuecomment-3895732179 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] Basic Extension Type Registry Implementation [datafusion]

2026-02-13 Thread via GitHub
tobixdev commented on code in PR #20312: URL: https://github.com/apache/datafusion/pull/20312#discussion_r2803017335 ## datafusion/common/src/types/extension.rs: ## @@ -0,0 +1,71 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] Basic Extension Type Registry Implementation [datafusion]

2026-02-13 Thread via GitHub
tobixdev commented on code in PR #20312: URL: https://github.com/apache/datafusion/pull/20312#discussion_r2803054679 ## datafusion/common/src/types/extension.rs: ## @@ -0,0 +1,71 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [I] Iceberg Table Maintenance: Acceleration Opportunities [datafusion-comet]

2026-02-13 Thread via GitHub
Shekharrajak commented on issue #3371: URL: https://github.com/apache/datafusion-comet/issues/3371#issuecomment-3895787934 @parthchandra , we do not have dataset as such but planning to have benchmarks something like CometIcebergMaintenanceBenchmark - we have bunch of benchmarks already h

[PR] TEST cargo test, no binary build step [datafusion]

2026-02-13 Thread via GitHub
kosiew opened a new pull request, #20338: URL: https://github.com/apache/datafusion/pull/20338 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] feat: add support `crc32` expression [datafusion-comet]

2026-02-13 Thread via GitHub
rafafrdz commented on PR #3498: URL: https://github.com/apache/datafusion-comet/pull/3498#issuecomment-3896074509 @andygrove Sorry I missed push the commit where I register the udf... 😅 Also I built and tested locally onto the matrix. For that, I'm using `mise` to handle the different ve

Re: [PR] Add schema-aware CastColumnExpr with owned cast/format options for safe struct casting [datafusion]

2026-02-13 Thread via GitHub
kosiew commented on PR #20202: URL: https://github.com/apache/datafusion/pull/20202#issuecomment-3896222554 @adriangb > review difficulty grows exponentially with PR size. Please ignore the diff in these 2 files. https://github.com/user-attachments/assets/38b73dbf-b3e1-

Re: [PR] fix: add scalar support for bit_count expression [datafusion-comet]

2026-02-13 Thread via GitHub
parthchandra commented on PR #3361: URL: https://github.com/apache/datafusion-comet/pull/3361#issuecomment-3896208036 Could you rebase? Otherwise this looks good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Allow custom OptimizerHints [datafusion-sqlparser-rs]

2026-02-13 Thread via GitHub
altmannmarcelo commented on PR #2216: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2216#issuecomment-3896245556 Thanks for the review @xitep, and glad you like the generalization! I understand the appeal of dialect flags — they're great when the parser can definitive

Re: [PR] Add statistics integration tests [datafusion]

2026-02-13 Thread via GitHub
gabotechs commented on PR #20292: URL: https://github.com/apache/datafusion/pull/20292#issuecomment-3895565661 > Thanks @gabotechs I think you can use already created tpcds files, please refer to https://github.com/apache/datafusion/blob/main/benchmarks/README.md#tpcds-1 I'm looking

Re: [PR] Hash join buffering on probe side [datafusion]

2026-02-13 Thread via GitHub
LiaCastaneda commented on code in PR #19761: URL: https://github.com/apache/datafusion/pull/19761#discussion_r2802909529 ## datafusion/physical-plan/src/buffer.rs: ## @@ -0,0 +1,626 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Hash join buffering on probe side [datafusion]

2026-02-13 Thread via GitHub
gabotechs commented on code in PR #19761: URL: https://github.com/apache/datafusion/pull/19761#discussion_r2802916139 ## datafusion/physical-plan/src/buffer.rs: ## @@ -0,0 +1,626 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] Snowflake: Lambda functions [datafusion-sqlparser-rs]

2026-02-13 Thread via GitHub
iffyio merged PR #2192: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2192 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] WIP: Upgrade DataFusion to arrow-rs/parquet 58.0.0 / `object_store` 13.0.0 [datafusion]

2026-02-13 Thread via GitHub
Dandandan commented on PR #19728: URL: https://github.com/apache/datafusion/pull/19728#issuecomment-3895650725 run benchmark tpch tpcds -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] WIP: Upgrade DataFusion to arrow-rs/parquet 58.0.0 / `object_store` 13.0.0 [datafusion]

2026-02-13 Thread via GitHub
alamb-ghbot commented on PR #19728: URL: https://github.com/apache/datafusion/pull/19728#issuecomment-3895651389 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] Basic Extension Type Registry Implementation [datafusion]

2026-02-13 Thread via GitHub
tobixdev commented on code in PR #20312: URL: https://github.com/apache/datafusion/pull/20312#discussion_r2802994840 ## datafusion/expr/src/registry.rs: ## @@ -215,3 +218,266 @@ impl FunctionRegistry for MemoryFunctionRegistry { self.udwfs.keys().cloned().collect()

Re: [PR] Implement ExecutionPlan::expressions() [datafusion]

2026-02-13 Thread via GitHub
LiaCastaneda commented on code in PR #20337: URL: https://github.com/apache/datafusion/pull/20337#discussion_r2802852566 ## datafusion/core/tests/physical_optimizer/filter_pushdown.rs: ## @@ -3868,3 +3868,103 @@ async fn test_filter_with_projection_pushdown() { ]; asse

Re: [PR] Implement ExecutionPlan::expressions() [datafusion]

2026-02-13 Thread via GitHub
LiaCastaneda commented on code in PR #20337: URL: https://github.com/apache/datafusion/pull/20337#discussion_r2802852566 ## datafusion/core/tests/physical_optimizer/filter_pushdown.rs: ## @@ -3868,3 +3868,103 @@ async fn test_filter_with_projection_pushdown() { ]; asse

[PR] Implement expressions() [datafusion]

2026-02-13 Thread via GitHub
LiaCastaneda opened a new pull request, #20337: URL: https://github.com/apache/datafusion/pull/20337 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [PR] WIP: Upgrade DataFusion to arrow-rs/parquet 58.0.0 / `object_store` 13.0.0 [datafusion]

2026-02-13 Thread via GitHub
alamb-ghbot commented on PR #19728: URL: https://github.com/apache/datafusion/pull/19728#issuecomment-3895842159 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

[PR] TEST build binary, use `cargo test` [datafusion]

2026-02-13 Thread via GitHub
kosiew opened a new pull request, #20339: URL: https://github.com/apache/datafusion/pull/20339 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] WIP: Upgrade DataFusion to arrow-rs/parquet 58.0.0 / `object_store` 13.0.0 [datafusion]

2026-02-13 Thread via GitHub
alamb-ghbot commented on PR #19728: URL: https://github.com/apache/datafusion/pull/19728#issuecomment-3895841802 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_update_arrow_58 Benchmark tpch_sf1.json ┏

Re: [PR] feat(datafusion-cli): enhance CLI helper with default hint [datafusion]

2026-02-13 Thread via GitHub
dariocurr commented on PR #20310: URL: https://github.com/apache/datafusion/pull/20310#issuecomment-3895916918 ![ScreenRecording2026-02-13at10 17 21-ezgif com-video-to-gif-converter](https://github.com/user-attachments/assets/8caaacab-c720-4aa3-9af7-cdb2b3b35a25) -- This is an automat

Re: [PR] WIP: Upgrade DataFusion to arrow-rs/parquet 58.0.0 / `object_store` 13.0.0 [datafusion]

2026-02-13 Thread via GitHub
alamb-ghbot commented on PR #19728: URL: https://github.com/apache/datafusion/pull/19728#issuecomment-3895914003 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_update_arrow_58 Benchmark tpcds_sf1.json

Re: [PR] feat(datafusion-cli): enhance CLI helper with default hint [datafusion]

2026-02-13 Thread via GitHub
dariocurr commented on PR #20310: URL: https://github.com/apache/datafusion/pull/20310#issuecomment-3895952378 ![datafusion](https://github.com/user-attachments/assets/090edd22-4c2e-43d5-95de-5ac7daa20b2a) -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Fix Python UDAF list-of-timestamps return by enforcing list-valued scalars and caching PyArrow types [datafusion-python]

2026-02-13 Thread via GitHub
timsaucer commented on PR #1347: URL: https://github.com/apache/datafusion-python/pull/1347#issuecomment-3897116364 @kosiew If you're okay merging my PR https://github.com/kosiew/datafusion-python/pull/7 into this branch, I think this PR is the last thing we have left before we start the r

Re: [PR] feat: add support for generating JSON formatted substrait plan [datafusion-python]

2026-02-13 Thread via GitHub
timsaucer commented on PR #1376: URL: https://github.com/apache/datafusion-python/pull/1376#issuecomment-3897068057 Thank you for the PR. There was an issue where the internal function name differed from the wrapper, so unit tests were failing. I pushed an update correcting this and making

Re: [PR] Enable parquet filter pushdown by default [datafusion]

2026-02-13 Thread via GitHub
alamb-ghbot commented on PR #19477: URL: https://github.com/apache/datafusion/pull/19477#issuecomment-3897274038 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_enable_pushdown Benchmark tpch_sf1.json ┏

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-13 Thread via GitHub
gene-bordegaray commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2801905807 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -417,6 +419,14 @@ impl FileOpener for ParquetOpener { predicate = predicate

Re: [PR] Enable parquet filter pushdown by default [datafusion]

2026-02-13 Thread via GitHub
Dandandan commented on PR #19477: URL: https://github.com/apache/datafusion/pull/19477#issuecomment-3897194969 run benchmark tpch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Enable parquet filter pushdown by default [datafusion]

2026-02-13 Thread via GitHub
Dandandan commented on PR #19477: URL: https://github.com/apache/datafusion/pull/19477#issuecomment-3897200116 > > Huh, there still seems to be, no? : > > > > > 🤔 there wans't on some previous run 🤔 tpch_mem doesn't have filter pushdown ;) The tpch (from parquet) and tpcd

Re: [PR] Enable parquet filter pushdown by default [datafusion]

2026-02-13 Thread via GitHub
alamb-ghbot commented on PR #19477: URL: https://github.com/apache/datafusion/pull/19477#issuecomment-3897195188 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-13 Thread via GitHub
Dandandan commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3897208669 As part of the epic, we should also look at the dynamix join filters which slows down the tpch/tpcds benchmarks with filter pushdown. https://github.com/apache/datafusio

Re: [PR] Optimize Clickbench Query 29 by adding a new Optimizer rule [datafusion]

2026-02-13 Thread via GitHub
devanshu0987 commented on PR #20180: URL: https://github.com/apache/datafusion/pull/20180#issuecomment-3897297790 Hi @UBarney @alamb Thanks for sharing your thought process on why it should be better to write a `simplify` rather than an optimizer rule. Helps me reorient my understandi

Re: [PR] Cache `PlanProperties`, add fast-path for `with_new_children` [datafusion]

2026-02-13 Thread via GitHub
alamb-ghbot commented on PR #19792: URL: https://github.com/apache/datafusion/pull/19792#issuecomment-3899704942 🤖 `./gh_compare_branch_bench.sh` [compare_branch_bench.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/compare_branch_bench.sh) Running Linux aal-dev 6.

[I] Add Python bindings for accessing ExecutionMetrics [datafusion-python]

2026-02-13 Thread via GitHub
ShreyeshArangath opened a new issue, #1379: URL: https://github.com/apache/datafusion-python/issues/1379 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** DataFusion Python currently provides execution metrics only through the

Re: [PR] perf: Optimize scalar fast path of atan2 [datafusion]

2026-02-13 Thread via GitHub
neilconway commented on code in PR #20336: URL: https://github.com/apache/datafusion/pull/20336#discussion_r2806788520 ## datafusion/functions/src/macros.rs: ## @@ -393,37 +394,76 @@ macro_rules! make_math_binary_udf { &self, args: Scala

Re: [PR] Updated `parse_infix(..)` in `mysql.rs` and `sqlite.rs` to handle error rather than `unwrap()` [datafusion-sqlparser-rs]

2026-02-13 Thread via GitHub
iffyio merged PR #2207: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2207 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Iceberg Table Maintenance: Acceleration Opportunities [datafusion-comet]

2026-02-13 Thread via GitHub
Shekharrajak commented on issue #3371: URL: https://github.com/apache/datafusion-comet/issues/3371#issuecomment-3901273572 Actually we can use the TPCDS and create fragmented tables - insert in batches, such that we have enough number of rows, number of files created to analyse the benchma

Re: [PR] MSSQL: Add support for WAITFOR statement [datafusion-sqlparser-rs]

2026-02-13 Thread via GitHub
iffyio commented on code in PR #2210: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2210#discussion_r2807131022 ## tests/sqlparser_mssql.rs: ## @@ -1702,6 +1702,43 @@ fn test_parse_throw() { ); } +#[test] +fn test_parse_waitfor() { +// WAITFOR DELAY +

Re: [PR] Fixed select dollar column from stage for snowflake [datafusion-sqlparser-rs]

2026-02-13 Thread via GitHub
iffyio merged PR #2165: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] MSSQL: Add support for TRAN shorthand [datafusion-sqlparser-rs]

2026-02-13 Thread via GitHub
iffyio merged PR #2212: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2212 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Unify the Prettier versions [datafusion]

2026-02-13 Thread via GitHub
cj-zhukov commented on issue #20024: URL: https://github.com/apache/datafusion/issues/20024#issuecomment-3901289861 done in https://github.com/apache/datafusion/pull/20311 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Unify the Prettier versions [datafusion]

2026-02-13 Thread via GitHub
cj-zhukov closed issue #20024: Unify the Prettier versions URL: https://github.com/apache/datafusion/issues/20024 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

[PR] perf: Cache remapped expression in DynamicFilterPhysicalExpr::current() [datafusion]

2026-02-13 Thread via GitHub
adriangb opened a new pull request, #20353: URL: https://github.com/apache/datafusion/pull/20353 ## Summary - Cache the result of `remap_children()` in `DynamicFilterPhysicalExpr::current()` to avoid redundant `transform_up()` tree walks on every per-batch call (`evaluate()`, `snapsh

[PR] perf: Optimize scalar fast path for `regexp_like` [datafusion]

2026-02-13 Thread via GitHub
kumarUjjawal opened a new pull request, #20354: URL: https://github.com/apache/datafusion/pull/20354 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion-comet/issues/2986 ## Rationale for this change `regexp_like` was converting scal

Re: [PR] perf: Optimize scalar fast path for `regexp_like` [datafusion]

2026-02-13 Thread via GitHub
Jefffrey commented on code in PR #20354: URL: https://github.com/apache/datafusion/pull/20354#discussion_r2807110950 ## datafusion/functions/src/regex/regexplike.rs: ## @@ -130,6 +133,13 @@ impl ScalarUDFImpl for RegexpLikeFunc { args: datafusion_expr::ScalarFunctionArg

Re: [PR] Cache `PlanProperties`, add fast-path for `with_new_children` [datafusion]

2026-02-13 Thread via GitHub
alamb-ghbot commented on PR #19792: URL: https://github.com/apache/datafusion/pull/19792#issuecomment-3900022235 🤖: Benchmark completed Details ``` group askalt_with_new_children_fast_path main -

Re: [PR] perf: Cache remapped expression in DynamicFilterPhysicalExpr::current() [datafusion]

2026-02-13 Thread via GitHub
alamb-ghbot commented on PR #20353: URL: https://github.com/apache/datafusion/pull/20353#issuecomment-3900303939 🤖: Benchmark completed Details ``` Comparing HEAD and cache-current-dynamic Benchmark tpcds_sf1.json

Re: [PR] Fixed select dollar column from stage for snowflake [datafusion-sqlparser-rs]

2026-02-13 Thread via GitHub
romanoff commented on PR #2165: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2165#issuecomment-385060 @iffyio Thank you. Updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Reduce ExtractLeafExpressions optimizer overhead with fast pre-scan [datafusion]

2026-02-13 Thread via GitHub
adriangb merged PR #20341: URL: https://github.com/apache/datafusion/pull/20341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Fix massive spill files for StringView/BinaryView columns [datafusion]

2026-02-13 Thread via GitHub
adriangb commented on PR #19444: URL: https://github.com/apache/datafusion/pull/19444#issuecomment-3900041410 Hi @EeshanBembi just pinging here, I'd love to see this across the line, just missing one final push! -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] refactor: Change TableScan.projection from indices to expressions [datafusion]

2026-02-13 Thread via GitHub
adriangb commented on PR #20091: URL: https://github.com/apache/datafusion/pull/20091#issuecomment-397341 @alamb my main thought was that: 1. This would allow more closely mirroring the physical API 2. This would allow an optimizer to distinguish pushing down from pushing down into

Re: [PR] refactor: Change TableScan.projection from indices to expressions [datafusion]

2026-02-13 Thread via GitHub
adriangb closed pull request #20091: refactor: Change TableScan.projection from indices to expressions URL: https://github.com/apache/datafusion/pull/20091 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Consolidate filters and projections onto `TableScan` [datafusion]

2026-02-13 Thread via GitHub
adriangb commented on PR #20061: URL: https://github.com/apache/datafusion/pull/20061#issuecomment-398176 My main thought was that: 1. This would allow more closely mirroring the physical API 2. This would allow an optimizer to distinguish pushing down from pushing down into a scan

Re: [PR] Consolidate filters and projections onto `TableScan` [datafusion]

2026-02-13 Thread via GitHub
adriangb closed pull request #20061: Consolidate filters and projections onto `TableScan` URL: https://github.com/apache/datafusion/pull/20061 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Consolidate filters and projections onto `TableScan` [datafusion]

2026-02-13 Thread via GitHub
adriangb commented on PR #20061: URL: https://github.com/apache/datafusion/pull/20061#issuecomment-399677 I do still think there's something to be said for the filters only refactor here... I just don't have time or motivation to see it across the line. -- This is an automated message

Re: [PR] Port regex_extract [datafusion]

2026-02-13 Thread via GitHub
comphead commented on PR #20308: URL: https://github.com/apache/datafusion/pull/20308#issuecomment-3900312080 From what I remember it was quite complicated to expose rust backed regexp into JVM world, because of rust/jvm regexp processing difference. The major ones: - no backtracking

Re: [PR] chore: Cast module refactor boolean module [datafusion-comet]

2026-02-13 Thread via GitHub
coderfender commented on PR #3491: URL: https://github.com/apache/datafusion-comet/pull/3491#issuecomment-3900320222 @andygrove , given that we removed dead cast code, I believe this PR is ready for a review . I was wondering if you would prefer splitting the PR into smaller PRs (one for b

Re: [I] Add Python bindings for accessing ExecutionMetrics [datafusion-python]

2026-02-13 Thread via GitHub
ShreyeshArangath commented on issue #1379: URL: https://github.com/apache/datafusion-python/issues/1379#issuecomment-3900446432 I'd like to work on this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Validate coerce int96 config 17498 [datafusion]

2026-02-13 Thread via GitHub
AlyAbdelmoneim commented on code in PR #20253: URL: https://github.com/apache/datafusion/pull/20253#discussion_r2806732137 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -1342,18 +1344,18 @@ mod tests { let time_units_and_expected = vec![

Re: [PR] Validate coerce int96 config 17498 [datafusion]

2026-02-13 Thread via GitHub
Jefffrey commented on code in PR #20253: URL: https://github.com/apache/datafusion/pull/20253#discussion_r2806730961 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -1342,18 +1344,18 @@ mod tests { let time_units_and_expected = vec![ ( -

Re: [PR] Port regex_extract [datafusion]

2026-02-13 Thread via GitHub
Jefffrey commented on code in PR #20308: URL: https://github.com/apache/datafusion/pull/20308#discussion_r2806741297 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,551 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [I] Iceberg Table Maintenance: Acceleration Opportunities [datafusion-comet]

2026-02-13 Thread via GitHub
parthchandra commented on issue #3371: URL: https://github.com/apache/datafusion-comet/issues/3371#issuecomment-3901053824 @Shekharrajak I don't know if TPC-DS static data sets would be sufficient to benchmark Iceberg table maintenance operations which typically involve a bunch of updates

Re: [I] Native engine crashes on literal sha2() with 'Unsupported argument types' [datafusion-comet]

2026-02-13 Thread via GitHub
parthchandra commented on issue #3340: URL: https://github.com/apache/datafusion-comet/issues/3340#issuecomment-3901085831 @0lai0 why not try the native code path first and if that fails, fallback to spark. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-13 Thread via GitHub
Dandandan commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3900332202 > > As part of the epic, we should also look at the dynamic join filters which slows down the tpch/tpcds benchmarks with filter pushdown. > > [#20318 (comment)](https://git

Re: [PR] Port regex_extract [datafusion]

2026-02-13 Thread via GitHub
Omega359 commented on PR #20308: URL: https://github.com/apache/datafusion/pull/20308#issuecomment-3897886637 > cc @Omega359 @comphead did we ever land on a consensus regarding `regexp_extract` and `regexp_substr`? We had some PRs for them before and they seemed to lapse, but looks like the

Re: [PR] chore: remove some dead cast code [datafusion-comet]

2026-02-13 Thread via GitHub
mbutrovich merged PR #3513: URL: https://github.com/apache/datafusion-comet/pull/3513 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] feat: CometNativeScan per-partition plan serde [datafusion-comet]

2026-02-13 Thread via GitHub
andygrove merged PR #3511: URL: https://github.com/apache/datafusion-comet/pull/3511 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Implement ExecutionPlan::expressions() [datafusion]

2026-02-13 Thread via GitHub
LiaCastaneda commented on code in PR #20337: URL: https://github.com/apache/datafusion/pull/20337#discussion_r2805366178 ## datafusion/core/tests/physical_optimizer/filter_pushdown.rs: ## @@ -3868,3 +3868,103 @@ async fn test_filter_with_projection_pushdown() { ]; asse

Re: [PR] Wrap immutable plan parts into Arc (make creating `ExecutionPlan`s less costly) [datafusion]

2026-02-13 Thread via GitHub
alamb commented on code in PR #19893: URL: https://github.com/apache/datafusion/pull/19893#discussion_r2805326958 ## datafusion/physical-plan/src/joins/hash_join/exec.rs: ## @@ -687,26 +797,14 @@ impl HashJoinExec { /// Return new instance of [HashJoinExec] with the given

Re: [PR] Implement ExecutionPlan::expressions() [datafusion]

2026-02-13 Thread via GitHub
LiaCastaneda commented on code in PR #20337: URL: https://github.com/apache/datafusion/pull/20337#discussion_r2805390030 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -204,6 +204,17 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { /// joins). fn

Re: [PR] Implement ExecutionPlan::expressions() [datafusion]

2026-02-13 Thread via GitHub
LiaCastaneda commented on PR #20337: URL: https://github.com/apache/datafusion/pull/20337#issuecomment-3898510714 Thanks both for the reviews! I will work on your suggestion @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Implement ExecutionPlan::expressions() [datafusion]

2026-02-13 Thread via GitHub
adriangb commented on code in PR #20337: URL: https://github.com/apache/datafusion/pull/20337#discussion_r2805393192 ## datafusion/core/tests/physical_optimizer/filter_pushdown.rs: ## @@ -3868,3 +3868,103 @@ async fn test_filter_with_projection_pushdown() { ]; assert_b

Re: [PR] fix: IS NULL doesn't type-check its input and panic [datafusion]

2026-02-13 Thread via GitHub
neilconway commented on PR #20306: URL: https://github.com/apache/datafusion/pull/20306#issuecomment-3898530434 Digging into this a bit further, this fix seems cleaner to me: ``` diff --git a/datafusion/optimizer/src/analyzer/type_coercion.rs b/datafusion/optimizer/src/analyzer/typ

[PR] Reduce ExtractLeafExpressions optimizer overhead with fast pre-scan [datafusion]

2026-02-13 Thread via GitHub
adriangb opened a new pull request, #20341: URL: https://github.com/apache/datafusion/pull/20341 ## Summary Follow-up to #20117 which added the `ExtractLeafExpressions` and `PushDownLeafProjections` optimizer rules for get_field pushdown. Benchmarking revealed that these rules

[PR] Avoid HashJoinExecBuilder when computing properties [datafusion]

2026-02-13 Thread via GitHub
alamb opened a new pull request, #20340: URL: https://github.com/apache/datafusion/pull/20340 ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/19893 ## Rationale for this change @2010YOUY01 noted in https://github.com/apache/dat

Re: [PR] Reduce ExtractLeafExpressions optimizer overhead with fast pre-scan [datafusion]

2026-02-13 Thread via GitHub
adriangb commented on PR #20341: URL: https://github.com/apache/datafusion/pull/20341#issuecomment-3898532058 run benchmark sql_planner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Reduce ExtractLeafExpressions optimizer overhead with fast pre-scan [datafusion]

2026-02-13 Thread via GitHub
alamb-ghbot commented on PR #20341: URL: https://github.com/apache/datafusion/pull/20341#issuecomment-3898532798 🤖 `./gh_compare_branch_bench.sh` [compare_branch_bench.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/compare_branch_bench.sh) Running Linux aal-dev 6.

[I] Incorrect type coercion for IS SIMILAR TO [datafusion]

2026-02-13 Thread via GitHub
neilconway opened a new issue, #20342: URL: https://github.com/apache/datafusion/issues/20342 ### Describe the bug ``` -- works SELECT CAST('hello' AS BYTEA) LIKE 'hello%'; -- fails ("Cannot infer common argument type for regex operation Binary ~ Utf8") SELECT CAST

Re: [PR] Avoid HashJoinExecBuilder in HashJoinExec::with_projection [datafusion]

2026-02-13 Thread via GitHub
alamb commented on code in PR #20340: URL: https://github.com/apache/datafusion/pull/20340#discussion_r2805410887 ## datafusion/physical-plan/src/joins/hash_join/exec.rs: ## @@ -839,9 +839,35 @@ impl HashJoinExec { can_project(&self.schema(), projection.as_deref())?;

Re: [PR] fix: IS NULL doesn't type-check its input and panic [datafusion]

2026-02-13 Thread via GitHub
neilconway commented on code in PR #20306: URL: https://github.com/apache/datafusion/pull/20306#discussion_r2805417152 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -744,18 +744,47 @@ impl TreeNodeRewriter for TypeCoercionRewriter<'_> { });

Re: [I] Incorrect type coercion for IS SIMILAR TO [datafusion]

2026-02-13 Thread via GitHub
neilconway commented on issue #20342: URL: https://github.com/apache/datafusion/issues/20342#issuecomment-3898537816 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Wrap immutable plan parts into Arc (make creating `ExecutionPlan`s less costly) [datafusion]

2026-02-13 Thread via GitHub
alamb commented on code in PR #19893: URL: https://github.com/apache/datafusion/pull/19893#discussion_r2805425860 ## datafusion/physical-plan/src/joins/hash_join/exec.rs: ## @@ -687,26 +797,14 @@ impl HashJoinExec { /// Return new instance of [HashJoinExec] with the given

Re: [PR] Wrap immutable plan parts into Arc (make creating `ExecutionPlan`s less costly) [datafusion]

2026-02-13 Thread via GitHub
alamb commented on code in PR #19893: URL: https://github.com/apache/datafusion/pull/19893#discussion_r2805425860 ## datafusion/physical-plan/src/joins/hash_join/exec.rs: ## @@ -687,26 +797,14 @@ impl HashJoinExec { /// Return new instance of [HashJoinExec] with the given

Re: [PR] feat(datafusion-cli): enhance CLI helper with default hint [datafusion]

2026-02-13 Thread via GitHub
alamb commented on PR #20310: URL: https://github.com/apache/datafusion/pull/20310#issuecomment-3899539587 I tried it out locally and it seems to work great for me: https://github.com/user-attachments/assets/2c6bb80f-d218-4ab2-918f-396083569770"; /> -- This is an automated mess

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-13 Thread via GitHub
gene-bordegaray commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2806125820 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -46,6 +47,9 @@ impl FilterState { } } +/// Per-partition filter expressions i

Re: [PR] perf: Optimize concat() UDF [datafusion]

2026-02-13 Thread via GitHub
alamb commented on code in PR #20317: URL: https://github.com/apache/datafusion/pull/20317#discussion_r2806155596 ## datafusion/functions/src/string/concat.rs: ## @@ -207,7 +207,11 @@ impl ScalarUDFImpl for ConcatFunc { DataType::Utf8View => {

Re: [PR] feat: Add TimestampNTZType support for casts and unix_timestamp [datafusion-comet]

2026-02-13 Thread via GitHub
andygrove commented on PR #3253: URL: https://github.com/apache/datafusion-comet/pull/3253#issuecomment-3899598631 I am moving this to draft until the DataFusion 52 upgrade is merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] fix: handle out of range errors in DATE_BIN instead of panicking [datafusion]

2026-02-13 Thread via GitHub
mishop-15 commented on code in PR #20221: URL: https://github.com/apache/datafusion/pull/20221#discussion_r2806191699 ## datafusion/functions/src/datetime/date_bin.rs: ## @@ -295,7 +295,7 @@ impl ScalarUDFImpl for DateBinFunc { const NANOS_PER_MICRO: i64 = 1_000; const NANOS_P

Re: [PR] feat: Add TimestampNTZType support for casts and unix_timestamp [datafusion-comet]

2026-02-13 Thread via GitHub
andygrove commented on PR #3253: URL: https://github.com/apache/datafusion-comet/pull/3253#issuecomment-3899626486 > What is the expected behavior when data is written to a timestamp ntz field with one session timezone and read by another user with a different session timezone? Can we upda

Re: [PR] Add schema-aware CastColumnExpr with owned cast/format options for safe struct casting [datafusion]

2026-02-13 Thread via GitHub
adriangb commented on code in PR #20202: URL: https://github.com/apache/datafusion/pull/20202#discussion_r2804845043 ## datafusion/proto/proto/datafusion.proto: ## @@ -896,6 +896,8 @@ message PhysicalExprNode { UnknownColumn unknown_column = 20; PhysicalHashExprNode

  1   2   3   >