Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-17 Thread via GitHub
Dandandan commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3912989383 > We should detect constant true filters Testing in https://github.com/apache/datafusion/pulls Perhaps we should make it a bit more smart so it will do this at the row

Re: [PR] feat: optimize CASE WHEN for divide-by-zero protection pattern [datafusion]

2026-02-17 Thread via GitHub
pepijnve commented on PR #19994: URL: https://github.com/apache/datafusion/pull/19994#issuecomment-3913031913 Since `scatter` is currently implemented in DataFusion, keeping it there might be the simplest. It would probably make sense to migrate this to arrow-rs eventually if it can be made

[PR] chore(deps): bump taiki-e/install-action from 2.67.27 to 2.68.0 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20398: URL: https://github.com/apache/datafusion/pull/20398 Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.67.27 to 2.68.0. Release notes Sourced from https://github.com/taiki-e/install-action/releases";>

[PR] chore(deps): bump actions/stale from 10.1.1 to 10.2.0 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20397: URL: https://github.com/apache/datafusion/pull/20397 Bumps [actions/stale](https://github.com/actions/stale) from 10.1.1 to 10.2.0. Release notes Sourced from https://github.com/actions/stale/releases";>actions/stale's releases.

Re: [PR] feat: optimize CASE WHEN for divide-by-zero protection pattern [datafusion]

2026-02-17 Thread via GitHub
CuteChuanChuan commented on PR #19994: URL: https://github.com/apache/datafusion/pull/19994#issuecomment-3913069578 > Since `scatter` is currently implemented in DataFusion, keeping it there might be the simplest. It would probably make sense to migrate this to arrow-rs eventually if it can

[PR] chore(deps): bump maturin from 1.11.5 to 1.12.2 in /docs [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20400: URL: https://github.com/apache/datafusion/pull/20400 Bumps [maturin](https://github.com/pyo3/maturin) from 1.11.5 to 1.12.2. Release notes Sourced from https://github.com/pyo3/maturin/releases";>maturin's releases. v1.12.2

[PR] chore(deps): bump env_logger from 0.11.8 to 0.11.9 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20402: URL: https://github.com/apache/datafusion/pull/20402 Bumps [env_logger](https://github.com/rust-cli/env_logger) from 0.11.8 to 0.11.9. Release notes Sourced from https://github.com/rust-cli/env_logger/releases";>env_logger's relea

[PR] chore(deps): bump uuid from 1.20.0 to 1.21.0 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20401: URL: https://github.com/apache/datafusion/pull/20401 Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.20.0 to 1.21.0. Release notes Sourced from https://github.com/uuid-rs/uuid/releases";>uuid's releases. v1.21.0 What'

[I] User annotations in ExecutionPlan [datafusion]

2026-02-17 Thread via GitHub
gabotechs opened a new issue, #20396: URL: https://github.com/apache/datafusion/issues/20396 ### Is your feature request related to a problem or challenge? The main purpose of this issue is to gather information about whether there's appetite from the community for having the ability

Re: [PR] [TEST] Don't pushdown dynamic filters that are true on open [datafusion]

2026-02-17 Thread via GitHub
Dandandan commented on PR #20395: URL: https://github.com/apache/datafusion/pull/20395#issuecomment-3912980532 run benchmark clickbench_partitioned DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true -- This is an automated

[PR] chore(deps): bump syn from 2.0.114 to 2.0.116 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20399: URL: https://github.com/apache/datafusion/pull/20399 Bumps [syn](https://github.com/dtolnay/syn) from 2.0.114 to 2.0.116. Release notes Sourced from https://github.com/dtolnay/syn/releases";>syn's releases. 2.0.116 Op

[PR] chore(deps): bump clap from 4.5.57 to 4.5.59 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20404: URL: https://github.com/apache/datafusion/pull/20404 Bumps [clap](https://github.com/clap-rs/clap) from 4.5.57 to 4.5.59. Release notes Sourced from https://github.com/clap-rs/clap/releases";>clap's releases. v4.5.59 [4.5.

[PR] chore(deps): bump sqlparser from 0.60.0 to 0.61.0 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20408: URL: https://github.com/apache/datafusion/pull/20408 Bumps [sqlparser](https://github.com/apache/datafusion-sqlparser-rs) from 0.60.0 to 0.61.0. Commits https://github.com/apache/datafusion-sqlparser-rs/commit/272c25ed83b97cce5

[PR] chore(deps): bump aws-config from 1.8.13 to 1.8.14 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20407: URL: https://github.com/apache/datafusion/pull/20407 Bumps [aws-config](https://github.com/smithy-lang/smithy-rs) from 1.8.13 to 1.8.14. Commits See full diff in https://github.com/smithy-lang/smithy-rs/commits";>compare view

[PR] chore(deps): bump indicatif from 0.18.3 to 0.18.4 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20410: URL: https://github.com/apache/datafusion/pull/20410 Bumps [indicatif](https://github.com/console-rs/indicatif) from 0.18.3 to 0.18.4. Release notes Sourced from https://github.com/console-rs/indicatif/releases";>indicatif's relea

[PR] chore(deps): bump sysinfo from 0.38.1 to 0.38.2 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20411: URL: https://github.com/apache/datafusion/pull/20411 Bumps [sysinfo](https://github.com/GuillaumeGomez/sysinfo) from 0.38.1 to 0.38.2. Changelog Sourced from https://github.com/GuillaumeGomez/sysinfo/blob/main/CHANGELOG.md";>sysinf

[PR] chore(deps): bump rand_distr from 0.5.1 to 0.6.0 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20403: URL: https://github.com/apache/datafusion/pull/20403 Bumps [rand_distr](https://github.com/rust-random/rand_distr) from 0.5.1 to 0.6.0. Changelog Sourced from https://github.com/rust-random/rand_distr/blob/master/CHANGELOG.md";>ran

[PR] chore(deps): bump tonic from 0.14.3 to 0.14.4 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20406: URL: https://github.com/apache/datafusion/pull/20406 Bumps [tonic](https://github.com/hyperium/tonic) from 0.14.3 to 0.14.4. Release notes Sourced from https://github.com/hyperium/tonic/releases";>tonic's releases. v0.14.4

[PR] chore(deps): bump sqllogictest from 0.29.0 to 0.29.1 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20405: URL: https://github.com/apache/datafusion/pull/20405 Bumps [sqllogictest](https://github.com/risinglightdb/sqllogictest-rs) from 0.29.0 to 0.29.1. Release notes Sourced from https://github.com/risinglightdb/sqllogictest-rs/releases

[PR] chore(deps): bump liblzma from 0.4.5 to 0.4.6 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] opened a new pull request, #20409: URL: https://github.com/apache/datafusion/pull/20409 Bumps [liblzma](https://github.com/portable-network-archive/liblzma-rs) from 0.4.5 to 0.4.6. Release notes Sourced from https://github.com/portable-network-archive/liblzma-rs/rel

Re: [PR] feat: Adaptive query execution (AQE) planner fundamentals [datafusion-ballista]

2026-02-17 Thread via GitHub
milenkovicm commented on PR #1372: URL: https://github.com/apache/datafusion-ballista/pull/1372#issuecomment-3916775195 @danielhumanmod @killzoner @mattcuento I'd like to ask you for review if you have some time. Please note that this functionality is far from finished (or useful) but I'd

Re: [I] Optimize the evaluation of `DATE_TRUNC() == )` when pushed down [datafusion]

2026-02-17 Thread via GitHub
sdf-jkl commented on issue #18319: URL: https://github.com/apache/datafusion/issues/18319#issuecomment-3916823216 > [@sdf-jkl](https://github.com/sdf-jkl) I have a couple questions I'm hoping you can help me out with: > > 1. for `take_function_args` for `date_trunc` function, do I get

Re: [I] Optimize the evaluation of `DATE_TRUNC() == )` when pushed down [datafusion]

2026-02-17 Thread via GitHub
sdf-jkl commented on issue #18319: URL: https://github.com/apache/datafusion/issues/18319#issuecomment-3916856934 @alamb Is there a reason to use [`DateTruncGranularity`](https://github.com/apache/datafusion/blob/468b690d71350bc19c7e7bafd5dc61800973d91e/datafusion/functions/src/datetime/date

[PR] feat: Cast date to Int (No Op) [datafusion-comet]

2026-02-17 Thread via GitHub
coderfender opened a new pull request, #3544: URL: https://github.com/apache/datafusion-comet/pull/3544 ## Which issue does this PR close? Closes #. This is a lighweight PR to return null when we cast from Date to Int / Long in Spark. Spark returns null so we do the sa

Re: [PR] feat: Cast date to Int (No Op) [datafusion-comet]

2026-02-17 Thread via GitHub
coderfender commented on PR #3544: URL: https://github.com/apache/datafusion-comet/pull/3544#issuecomment-3916849223 @andygrove , Please take a look whenever you get a chance. Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] chore: DataFusion 52 migration [datafusion-comet]

2026-02-17 Thread via GitHub
comphead closed pull request #3470: chore: DataFusion 52 migration URL: https://github.com/apache/datafusion-comet/pull/3470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] chore: DataFusion 52 migration [datafusion-comet]

2026-02-17 Thread via GitHub
comphead commented on PR #3470: URL: https://github.com/apache/datafusion-comet/pull/3470#issuecomment-3916939876 Closing in favor of #3536 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[I] Avoid representing uncorrelated scalar subqueries as joins [datafusion]

2026-02-17 Thread via GitHub
neilconway opened a new issue, #20415: URL: https://github.com/apache/datafusion/issues/20415 ### Is your feature request related to a problem or challenge? We currently implement uncorrelated scalar subqueries as joins. For example, the query from #18181 is (after inlining a CTE):

Re: [I] Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-02-17 Thread via GitHub
neilconway commented on issue #3781: URL: https://github.com/apache/datafusion/issues/3781#issuecomment-3917157685 I'd like to take this on! If anyone has bandwidth to help with reviewing, that would be great -- I'd love to get some high-level feedback before writing code. -- This is an

Re: [I] Avoid representing uncorrelated scalar subqueries as joins [datafusion]

2026-02-17 Thread via GitHub
neilconway closed issue #20415: Avoid representing uncorrelated scalar subqueries as joins URL: https://github.com/apache/datafusion/issues/20415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Avoid representing uncorrelated scalar subqueries as joins [datafusion]

2026-02-17 Thread via GitHub
neilconway commented on issue #20415: URL: https://github.com/apache/datafusion/issues/20415#issuecomment-3917158883 Closing in favor of #3781 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-02-17 Thread via GitHub
neilconway commented on issue #3781: URL: https://github.com/apache/datafusion/issues/3781#issuecomment-3917159911 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-02-17 Thread via GitHub
neilconway commented on issue #3781: URL: https://github.com/apache/datafusion/issues/3781#issuecomment-3917203388 @andygrove Looking at your suggestion for the optimizer trait, it sounds like you had in mind that the logical optimization would evaluate the subquery, rather than doing evalu

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-17 Thread via GitHub
gene-bordegaray commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2819394087 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -450,6 +531,25 @@ impl PhysicalExpr for DynamicFilterPhysicalExpr { } } +///

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-17 Thread via GitHub
gene-bordegaray commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2819378323 ## datafusion/common/src/config.rs: ## @@ -996,6 +996,39 @@ config_namespace! { /// /// Note: This may reduce parallelism, rooting from t

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-17 Thread via GitHub
gene-bordegaray commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2819378323 ## datafusion/common/src/config.rs: ## @@ -996,6 +996,39 @@ config_namespace! { /// /// Note: This may reduce parallelism, rooting from t

Re: [PR] feat: add support for `aes_decrypt` expression [datafusion-comet]

2026-02-17 Thread via GitHub
rafafrdz commented on code in PR #3497: URL: https://github.com/apache/datafusion-comet/pull/3497#discussion_r2819446184 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -146,6 +146,7 @@ object QueryPlanSerde extends Logging with CometExprShim { cl

Re: [PR] chore: [df52] migration [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on code in PR #3536: URL: https://github.com/apache/datafusion-comet/pull/3536#discussion_r2819296741 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -707,22 +707,38 @@ pub fn spark_cast( data_type: &DataType, cast_options: &SparkCastOption

Re: [PR] chore: [df52] migration [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on code in PR #3536: URL: https://github.com/apache/datafusion-comet/pull/3536#discussion_r2819302829 ## native/spark-expr/src/utils.rs: ## @@ -71,6 +72,49 @@ pub fn array_with_timezone( to_type: Option<&DataType>, ) -> Result { match array.data_ty

Re: [PR] chore: [df52] migration [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on code in PR #3536: URL: https://github.com/apache/datafusion-comet/pull/3536#discussion_r2819304750 ## native/core/src/parquet/parquet_exec.rs: ## @@ -166,27 +203,24 @@ fn get_options( (table_parquet_options, spark_parquet_options) } -fn get_file_co

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-17 Thread via GitHub
darmie commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3917344339 I profiled a large category of regressions and have a fix for them. Sharing findings below. ### Filter columns ⊆ projection columns: zero I/O benefit from RowFilter

Re: [I] Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-02-17 Thread via GitHub
andygrove commented on issue #3781: URL: https://github.com/apache/datafusion/issues/3781#issuecomment-3917346520 Thanks for looking into this @neilconway. This issue is more than 3 years old, so who knows what I was thinking at the time 🤔 ... this issue may be somewhat stale at this point

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-17 Thread via GitHub
adriangb commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2818058399 ## datafusion/common/src/config.rs: ## @@ -996,6 +996,39 @@ config_namespace! { /// /// Note: This may reduce parallelism, rooting from the I/O

Re: [PR] fix: Support concat_ws with literal NULL separator [datafusion-comet]

2026-02-17 Thread via GitHub
coderfender commented on PR #3542: URL: https://github.com/apache/datafusion-comet/pull/3542#issuecomment-3917357349 Tes failures : ``` - sql-file: expressions/string/concat_ws.sql [parquet.enable.dictionary=false] *** FAILED *** (719 milliseconds) org.apache.spark.SparkExceptio

Re: [PR] chore: [df52] migration [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on code in PR #3536: URL: https://github.com/apache/datafusion-comet/pull/3536#discussion_r2819299162 ## native/spark-expr/src/utils.rs: ## @@ -127,6 +171,7 @@ pub fn array_with_timezone( } fn datetime_cast_err(value: i64) -> ArrowError { +println!("{

Re: [PR] chore: [df52] migration [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on code in PR #3536: URL: https://github.com/apache/datafusion-comet/pull/3536#discussion_r2819297807 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -930,7 +951,19 @@ fn cast_array( ))) } }; -Ok(spark_cast_postprocess(

Re: [PR] [WIP] fix: do not ignore test SPARK-48037 [datafusion-comet]

2026-02-17 Thread via GitHub
kazuyukitanimura commented on PR #2774: URL: https://github.com/apache/datafusion-comet/pull/2774#issuecomment-3917489565 > I could look into this perhaps next week if we still need to address this @kazuyukitanimura ? Yes please, thank you @coderfender -- This is an automated mes

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-17 Thread via GitHub
darmie commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3917493563 @adriangb you raise a valid point. My approach is a bit too broad. With a multi-conjunct predicate like `WHERE id = 123 AND long_message LIKE '%foo%'`, the RowFilter evaluates co

Re: [PR] fix: Support concat_ws with literal NULL separator [datafusion-comet]

2026-02-17 Thread via GitHub
coderfender commented on PR #3542: URL: https://github.com/apache/datafusion-comet/pull/3542#issuecomment-3917679967 @0lai0 , @andygrove . We might want to hold off onto this PR before merging. There is a test failure and I am not sure we covered all possible `Literal` conditions in our

Re: [PR] fix: Support concat_ws with literal NULL separator [datafusion-comet]

2026-02-17 Thread via GitHub
coderfender commented on PR #3542: URL: https://github.com/apache/datafusion-comet/pull/3542#issuecomment-3917696373 We have 2 options here . 1. We could either merge this PR to handle NULL separator (but continue to ignore the failed test) 2. Handle case where all inputs are litera

Re: [PR] fix: Support concat_ws with literal NULL separator [datafusion-comet]

2026-02-17 Thread via GitHub
coderfender commented on code in PR #3542: URL: https://github.com/apache/datafusion-comet/pull/3542#discussion_r2819656681 ## spark/src/test/resources/sql-tests/expressions/string/concat_ws.sql: ## @@ -43,5 +43,5 @@ query SELECT concat_ws(' ', first_name, middle_initial, last_

Re: [PR] fix: Support concat_ws with literal NULL separator [datafusion-comet]

2026-02-17 Thread via GitHub
coderfender commented on code in PR #3542: URL: https://github.com/apache/datafusion-comet/pull/3542#discussion_r2819652823 ## spark/src/test/resources/sql-tests/expressions/string/concat_ws.sql: ## @@ -43,5 +43,5 @@ query SELECT concat_ws(' ', first_name, middle_initial, last_

[I] Serialize dynamic filters across network boundaries [datafusion]

2026-02-17 Thread via GitHub
jayshrivastava opened a new issue, #20418: URL: https://github.com/apache/datafusion/issues/20418 ### Is your feature request related to a problem or challenge? Dynamic filters are "inlined" when serialized and deserialized (ex. maybe converted to a `LiteralExpr` or something).

Re: [I] Serialize dynamic filters across network boundaries [datafusion]

2026-02-17 Thread via GitHub
jayshrivastava commented on issue #20418: URL: https://github.com/apache/datafusion/issues/20418#issuecomment-3917720183 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] perf: Skip RowFilter when all predicate columns are in the projection [datafusion]

2026-02-17 Thread via GitHub
darmie opened a new pull request, #20417: URL: https://github.com/apache/datafusion/pull/20417 ## Which issue does this PR close? - Closes part of #20324 (addresses the "filter columns ⊆ projection columns" category of regressions). - Related: #20325 (Q10 investigation) ## R

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-17 Thread via GitHub
gene-bordegaray commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2819378323 ## datafusion/common/src/config.rs: ## @@ -996,6 +996,39 @@ config_namespace! { /// /// Note: This may reduce parallelism, rooting from t

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-17 Thread via GitHub
gene-bordegaray commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2819378323 ## datafusion/common/src/config.rs: ## @@ -996,6 +996,39 @@ config_namespace! { /// /// Note: This may reduce parallelism, rooting from t

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-17 Thread via GitHub
adriangb commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3917416683 > 15 of the regressing ClickBench queries (Q10-Q22, Q25, Q27) filter on a column that is also in the `SELECT` projection. When all filter columns are already projected, the Row

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-17 Thread via GitHub
gene-bordegaray commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2819394087 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -450,6 +531,25 @@ impl PhysicalExpr for DynamicFilterPhysicalExpr { } } +///

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-17 Thread via GitHub
gene-bordegaray commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2819399629 ## datafusion/physical-plan/src/joins/hash_join/exec.rs: ## @@ -809,6 +831,23 @@ impl HashJoinExec { self.dynamic_filter.as_ref().map(|df| &df.fil

Re: [PR] chore: [df52] migration [datafusion-comet]

2026-02-17 Thread via GitHub
comphead commented on code in PR #3536: URL: https://github.com/apache/datafusion-comet/pull/3536#discussion_r2819550773 ## native/core/src/parquet/parquet_exec.rs: ## @@ -166,27 +203,24 @@ fn get_options( (table_parquet_options, spark_parquet_options) } -fn get_file_con

[PR] chore: CometConf change default batch size values [datafusion-comet]

2026-02-17 Thread via GitHub
comphead opened a new pull request, #3545: URL: https://github.com/apache/datafusion-comet/pull/3545 ## Which issue does this PR close? Closes #. ## Rationale for this change Upon benchmarking chcking if default config values are not optimal for initial setup

Re: [PR] chore: CometConf change default batch size values [datafusion-comet]

2026-02-17 Thread via GitHub
comphead commented on PR #3545: URL: https://github.com/apache/datafusion-comet/pull/3545#issuecomment-3917736047 The results actually mixed 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

2026-02-17 Thread via GitHub
darmie commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3917751164 @adriangb I have tightened the guard in [#20417 ](https://github.com/apache/datafusion/pull/20417, the batch filter path now only triggers when the predicate has a single *stati

Re: [PR] Migrate Python usage to uv workspace [datafusion]

2026-02-17 Thread via GitHub
adriangb commented on code in PR #20414: URL: https://github.com/apache/datafusion/pull/20414#discussion_r2819706516 ## .github/workflows/docs_pr.yaml: ## @@ -44,16 +44,10 @@ jobs: with: submodules: true fetch-depth: 1 - - name: Setup Python -

Re: [PR] Migrate Python usage to uv workspace [datafusion]

2026-02-17 Thread via GitHub
adriangb commented on code in PR #20414: URL: https://github.com/apache/datafusion/pull/20414#discussion_r2819707382 ## pyproject.toml: ## @@ -0,0 +1,2 @@ +[tool.uv.workspace] +members = ["benchmarks", "dev", "docs"] Review Comment: Dunno, maybe personal preference, I feel i

Re: [I] Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-02-17 Thread via GitHub
neilconway commented on issue #3781: URL: https://github.com/apache/datafusion/issues/3781#issuecomment-3917789923 @andygrove Thanks, makes sense! I'll plan to start with the approach that I outlined for the time being. -- This is an automated message from the Apache Git Service. To respo

Re: [PR] perf: optimize `array_distinct` with batched row conversion [datafusion]

2026-02-17 Thread via GitHub
Dandandan merged PR #20364: URL: https://github.com/apache/datafusion/pull/20364 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Fix `try_shrink` not freeing back to pool [datafusion]

2026-02-17 Thread via GitHub
hareshkh commented on PR #20382: URL: https://github.com/apache/datafusion/pull/20382#issuecomment-3916558148 @mbutrovich @comphead : Sorry, my bad here. This PR only fixes a bug introduced by https://github.com/apache/datafusion/pull/19759 so branch-51 and branch-52 (which use usize instea

Re: [PR] CI: build and run sqllogictests binary directly in extended workflow [datafusion]

2026-02-17 Thread via GitHub
nuno-faria commented on code in PR #20282: URL: https://github.com/apache/datafusion/pull/20282#discussion_r2818588829 ## .github/workflows/extended.yml: ## @@ -167,11 +167,19 @@ jobs: uses: ./.github/actions/setup-builder with: rust-version: stable

Re: [PR] docs: update roadmap [datafusion-comet]

2026-02-17 Thread via GitHub
mbutrovich merged PR #3543: URL: https://github.com/apache/datafusion-comet/pull/3543 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] feat: Support int to timestamp casts [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on code in PR #3541: URL: https://github.com/apache/datafusion-comet/pull/3541#discussion_r2819037616 ## spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala: ## @@ -332,6 +332,38 @@ abstract class CometTestBase } } +// inspired from spark

Re: [PR] chore: Cast module refactor boolean module [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on code in PR #3491: URL: https://github.com/apache/datafusion-comet/pull/3491#discussion_r2819793815 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -1138,25 +1123,26 @@ fn cast_binary_formatter(value: &[u8]) -> String { /// Determines if DataFusio

Re: [PR] chore: Cast module refactor boolean module [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on code in PR #3491: URL: https://github.com/apache/datafusion-comet/pull/3491#discussion_r2819802100 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -1138,25 +1123,26 @@ fn cast_binary_formatter(value: &[u8]) -> String { /// Determines if DataFusio

Re: [PR] chore: CometConf change default batch size values [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on PR #3545: URL: https://github.com/apache/datafusion-comet/pull/3545#issuecomment-3917896710 @sqlbenchmark run tpch --iterations 3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] chore(deps): bump maturin from 1.11.5 to 1.12.2 in /docs [datafusion]

2026-02-17 Thread via GitHub
comphead merged PR #20400: URL: https://github.com/apache/datafusion/pull/20400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore(deps): bump uuid from 1.20.0 to 1.21.0 [datafusion]

2026-02-17 Thread via GitHub
comphead merged PR #20401: URL: https://github.com/apache/datafusion/pull/20401 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] [Minor] Update object_store to 0.12.5 [datafusion]

2026-02-17 Thread via GitHub
comphead merged PR #20378: URL: https://github.com/apache/datafusion/pull/20378 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Optimize the evaluation of `DATE_TRUNC() == )` when pushed down [datafusion]

2026-02-17 Thread via GitHub
drin commented on issue #18319: URL: https://github.com/apache/datafusion/issues/18319#issuecomment-3916520294 For example: ```rust // lhs(>) --> column >= next_interval(part, const_rhs) // lhs(<=) --> column < next_interval(part, const_rhs) ``` If I have `date_trunc('mon

Re: [PR] feat: Cast date to Numeric (No Op) [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on code in PR #3544: URL: https://github.com/apache/datafusion-comet/pull/3544#discussion_r2818948744 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -960,44 +960,40 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHel

Re: [PR] feat: Cast date to Numeric (No Op) [datafusion-comet]

2026-02-17 Thread via GitHub
coderfender commented on code in PR #3544: URL: https://github.com/apache/datafusion-comet/pull/3544#discussion_r2818971097 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -960,44 +960,40 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanH

Re: [PR] feat: Support int to timestamp casts [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on code in PR #3541: URL: https://github.com/apache/datafusion-comet/pull/3541#discussion_r2818993832 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -223,12 +223,22 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHel

Re: [PR] feat: Support int to timestamp casts [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on code in PR #3541: URL: https://github.com/apache/datafusion-comet/pull/3541#discussion_r2818993832 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -223,12 +223,22 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHel

Re: [PR] feat: Support int to timestamp casts [datafusion-comet]

2026-02-17 Thread via GitHub
coderfender commented on code in PR #3541: URL: https://github.com/apache/datafusion-comet/pull/3541#discussion_r2818999894 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -223,12 +223,22 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanH

Re: [PR] feat: Support int to timestamp casts [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on code in PR #3541: URL: https://github.com/apache/datafusion-comet/pull/3541#discussion_r2819007933 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -613,6 +613,20 @@ macro_rules! cast_decimal_to_int32_up { }}; } +macro_rules! cast_int_to_ti

Re: [PR] feat: Support int to timestamp casts [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on code in PR #3541: URL: https://github.com/apache/datafusion-comet/pull/3541#discussion_r2819010657 ## spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala: ## @@ -332,6 +332,38 @@ abstract class CometTestBase } } +// inspired from spark

Re: [PR] feat: Support int to timestamp casts [datafusion-comet]

2026-02-17 Thread via GitHub
andygrove commented on code in PR #3541: URL: https://github.com/apache/datafusion-comet/pull/3541#discussion_r2819012872 ## spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala: ## @@ -332,6 +332,38 @@ abstract class CometTestBase } } +// inspired from spark

Re: [PR] Migrate Python usage to uv workspace [datafusion]

2026-02-17 Thread via GitHub
timsaucer commented on code in PR #20414: URL: https://github.com/apache/datafusion/pull/20414#discussion_r2818270003 ## .github/workflows/docs.yaml: ## @@ -40,17 +40,11 @@ jobs: ref: asf-site path: asf-site - - name: Setup Python -uses: acti

Re: [PR] chore(deps): bump clap from 4.5.57 to 4.5.59 [datafusion]

2026-02-17 Thread via GitHub
comphead merged PR #20404: URL: https://github.com/apache/datafusion/pull/20404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore(deps): bump clap from 4.5.57 to 4.5.59 [datafusion]

2026-02-17 Thread via GitHub
dependabot[bot] commented on PR #20404: URL: https://github.com/apache/datafusion/pull/20404#issuecomment-3916311019 Dependabot attempted to update this pull request, but because the branch `dependabot/cargo/main/clap-4.5.59` is protected it was unable to do so. -- This is an automated me

Re: [PR] chore(deps): bump tonic from 0.14.3 to 0.14.4 [datafusion]

2026-02-17 Thread via GitHub
comphead merged PR #20406: URL: https://github.com/apache/datafusion/pull/20406 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Optimize the evaluation of `DATE_TRUNC() == )` when pushed down [datafusion]

2026-02-17 Thread via GitHub
drin commented on issue #18319: URL: https://github.com/apache/datafusion/issues/18319#issuecomment-3916501200 @sdf-jkl I have a couple questions I'm hoping you can help me out with: 1. for `take_function_args` for `date_trunc` function, do I get the comparison operator? I need to know fo

Re: [PR] chore(deps): bump sqllogictest from 0.29.0 to 0.29.1 [datafusion]

2026-02-17 Thread via GitHub
comphead merged PR #20405: URL: https://github.com/apache/datafusion/pull/20405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore(deps): bump env_logger from 0.11.8 to 0.11.9 [datafusion]

2026-02-17 Thread via GitHub
comphead merged PR #20402: URL: https://github.com/apache/datafusion/pull/20402 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore(deps): bump actions/stale from 10.1.1 to 10.2.0 [datafusion]

2026-02-17 Thread via GitHub
comphead merged PR #20397: URL: https://github.com/apache/datafusion/pull/20397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore(deps): bump indicatif from 0.18.3 to 0.18.4 [datafusion]

2026-02-17 Thread via GitHub
comphead merged PR #20410: URL: https://github.com/apache/datafusion/pull/20410 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore(deps): bump liblzma from 0.4.5 to 0.4.6 [datafusion]

2026-02-17 Thread via GitHub
comphead merged PR #20409: URL: https://github.com/apache/datafusion/pull/20409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore(deps): bump aws-config from 1.8.13 to 1.8.14 [datafusion]

2026-02-17 Thread via GitHub
comphead merged PR #20407: URL: https://github.com/apache/datafusion/pull/20407 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Fix name tracker [datafusion]

2026-02-17 Thread via GitHub
xanderbailey commented on PR #19856: URL: https://github.com/apache/datafusion/pull/19856#issuecomment-3916679171 @LiaCastaneda are you able to give this a look, seems like @dd-annarose and @hareshkh are good with it but I know you're also invested in the substrait work. -- This is an a

[PR] wip [datafusion]

2026-02-17 Thread via GitHub
jayshrivastava opened a new pull request, #20416: URL: https://github.com/apache/datafusion/pull/20416 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes

  1   2   3   >