Re: [PR] perf: Push down join key filters for LEFT/RIGHT/ANTI joins [datafusion]

2026-01-27 Thread via GitHub
xudong963 commented on code in PR #19918: URL: https://github.com/apache/datafusion/pull/19918#discussion_r2730802413 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -457,9 +457,9 @@ fn push_down_all_join( // For infer predicates, if they can not push through join,

[PR] chore(deps): bump setuptools from 80.10.1 to 80.10.2 in /docs [datafusion]

2026-01-27 Thread via GitHub
dependabot[bot] opened a new pull request, #20022: URL: https://github.com/apache/datafusion/pull/20022 Bumps [setuptools](https://github.com/pypa/setuptools) from 80.10.1 to 80.10.2. Changelog Sourced from https://github.com/pypa/setuptools/blob/main/NEWS.rst";>setuptools's chang

Re: [PR] fix: add parentheses to nested binary expression Display [datafusion]

2026-01-27 Thread via GitHub
AndreaBozzo commented on PR #19916: URL: https://github.com/apache/datafusion/pull/19916#issuecomment-3803841177 This originally contained changes to both 'Display' and 'SchemaDisplay', but then i had to revert schema display as It seems It Needed to be stabile for optimizer field lookup.

Re: [PR] chore(deps): bump pbjson-types from 0.8.0 to 0.9.0 in the proto group [datafusion]

2026-01-27 Thread via GitHub
dependabot[bot] commented on PR #20021: URL: https://github.com/apache/datafusion/pull/20021#issuecomment-3803859152 This pull request was built based on a group rule. Closing it will not ignore any of these versions in future pull requests. To ignore these dependencies, configure [ig

Re: [PR] Make scan filter push-down idempotent [datafusion]

2026-01-27 Thread via GitHub
xudong963 commented on code in PR #20003: URL: https://github.com/apache/datafusion/pull/20003#discussion_r2730956061 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -1154,34 +1160,109 @@ impl OptimizerRule for PushDownFilter { .map(|(pred, _)| pred);

Re: [I] Improve filter push-down [datafusion]

2026-01-27 Thread via GitHub
xudong963 commented on issue #19929: URL: https://github.com/apache/datafusion/issues/19929#issuecomment-3803967479 > 2\. Now, the optimizer thinks the conjunction `a = 1 AND b = 1` is supported exactly, but it is not. It sounds like the case will cause a wrong result, potentially, do

Re: [PR] fix: correct weight handling in approx_percentile_cont_with_weight [datafusion]

2026-01-27 Thread via GitHub
Jefffrey commented on code in PR #19941: URL: https://github.com/apache/datafusion/pull/19941#discussion_r2730970248 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -2029,11 +2029,12 @@ statement ok INSERT INTO t1 VALUES (TRUE); # ISSUE: https://github.com/apache/

Re: [PR] chore(deps): bump taiki-e/install-action from 2.67.9 to 2.67.13 [datafusion]

2026-01-27 Thread via GitHub
Jefffrey merged PR #20020: URL: https://github.com/apache/datafusion/pull/20020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] perf: Parallelize list_files_for_scan using tokio::task::JoinSet [datafusion]

2026-01-27 Thread via GitHub
Tushar7012 opened a new pull request, #20023: URL: https://github.com/apache/datafusion/pull/20023 ## Which issue does this PR close? - Part of improving DataFusion's file listing performance for large-scale table scans. ## Rationale for this change When a `ListingTable`

Re: [PR] Physical-level placeholders [datafusion]

2026-01-27 Thread via GitHub
LLDay commented on code in PR #20009: URL: https://github.com/apache/datafusion/pull/20009#discussion_r2731171877 ## datafusion/physical-plan/src/resolve_placeholders.rs: ## @@ -0,0 +1,327 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[PR] chore(deps): bump taiki-e/install-action from 2.67.9 to 2.67.13 [datafusion]

2026-01-27 Thread via GitHub
dependabot[bot] opened a new pull request, #20020: URL: https://github.com/apache/datafusion/pull/20020 Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.67.9 to 2.67.13. Release notes Sourced from https://github.com/taiki-e/install-action/releases";>

Re: [PR] chore(deps): bump setuptools from 80.10.1 to 80.10.2 in /docs [datafusion]

2026-01-27 Thread via GitHub
xudong963 merged PR #20022: URL: https://github.com/apache/datafusion/pull/20022 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Physical-level placeholders [datafusion]

2026-01-27 Thread via GitHub
askalt commented on code in PR #20009: URL: https://github.com/apache/datafusion/pull/20009#discussion_r2730729411 ## datafusion/physical-plan/src/resolve_placeholders.rs: ## @@ -0,0 +1,327 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] refactor: extract pushdown test utilities to shared module [datafusion]

2026-01-27 Thread via GitHub
xudong963 merged PR #20010: URL: https://github.com/apache/datafusion/pull/20010 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] chore(deps): bump pbjson-types from 0.8.0 to 0.9.0 in the proto group [datafusion]

2026-01-27 Thread via GitHub
Jefffrey closed pull request #20021: chore(deps): bump pbjson-types from 0.8.0 to 0.9.0 in the proto group URL: https://github.com/apache/datafusion/pull/20021 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] chore(deps): bump pbjson-types from 0.8.0 to 0.9.0 in the proto group [datafusion]

2026-01-27 Thread via GitHub
dependabot[bot] opened a new pull request, #20021: URL: https://github.com/apache/datafusion/pull/20021 Bumps the proto group with 1 update: [pbjson-types](https://github.com/influxdata/pbjson). Updates `pbjson-types` from 0.8.0 to 0.9.0 Commits See full diff in https://

Re: [I] Improve filter push-down [datafusion]

2026-01-27 Thread via GitHub
askalt commented on issue #19929: URL: https://github.com/apache/datafusion/issues/19929#issuecomment-3804046965 > > 2. Now, the optimizer thinks the conjunction `a = 1 AND b = 1` is supported exactly, but it is not. > > It sounds like the case will cause a wrong result, potentially,

Re: [PR] Make scan filter push-down idempotent [datafusion]

2026-01-27 Thread via GitHub
askalt commented on code in PR #20003: URL: https://github.com/apache/datafusion/pull/20003#discussion_r2731044365 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -1154,34 +1160,109 @@ impl OptimizerRule for PushDownFilter { .map(|(pred, _)| pred);

Re: [PR] Add Parquet read pruning configuration for max elements in inList [datafusion]

2026-01-27 Thread via GitHub
xudong963 commented on code in PR #19928: URL: https://github.com/apache/datafusion/pull/19928#discussion_r2731072158 ## datafusion/pruning/src/pruning_predicate.rs: ## @@ -461,7 +473,11 @@ impl PruningPredicate { /// returns a new expression. /// It is recommended tha

Re: [PR] perf: Parallelize list_files_for_scan using tokio::task::JoinSet [datafusion]

2026-01-27 Thread via GitHub
Copilot commented on code in PR #20023: URL: https://github.com/apache/datafusion/pull/20023#discussion_r2731215015 ## datafusion/catalog-listing/src/table.rs: ## @@ -712,20 +715,84 @@ impl ListingTable { }); }; // list files (with partitions) -

Re: [PR] Rewrite physical expressions in execution plans [datafusion]

2026-01-27 Thread via GitHub
LLDay commented on code in PR #20009: URL: https://github.com/apache/datafusion/pull/20009#discussion_r2732245481 ## datafusion/physical-plan/src/projection.rs: ## @@ -485,6 +485,45 @@ impl ExecutionPlan for ProjectionExec { .ok() }) } + +

Re: [PR] Rewrite physical expressions in execution plans [datafusion]

2026-01-27 Thread via GitHub
LLDay commented on code in PR #20009: URL: https://github.com/apache/datafusion/pull/20009#discussion_r2732249159 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -722,6 +722,48 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { ) -> Option> { No

Re: [PR] Rewrite physical expressions in execution plans [datafusion]

2026-01-27 Thread via GitHub
LLDay commented on code in PR #20009: URL: https://github.com/apache/datafusion/pull/20009#discussion_r2732234589 ## datafusion/execution/src/task.rs: ## @@ -48,6 +50,8 @@ pub struct TaskContext { window_functions: HashMap>, /// Runtime environment associated with this

[PR] perf: Cache reflection lookups in Iceberg serde for 24% faster serialization [datafusion-comet]

2026-01-27 Thread via GitHub
andygrove opened a new pull request, #3298: URL: https://github.com/apache/datafusion-comet/pull/3298 ## Summary - Cache `Class.forName()` and `getMethod()` reflection calls in a `ReflectionCache` object - Created once per `convert()` call instead of repeatedly for each FileScanTask

Re: [I] Indeterministic test failure in `SpillPool` [datafusion]

2026-01-27 Thread via GitHub
dekuu5 commented on issue #20027: URL: https://github.com/apache/datafusion/issues/20027#issuecomment-3805814194 hey i am new to datafusion can i take my luck on this one ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] feat: Enable native columnar-to-row by default [WIP] [datafusion-comet]

2026-01-27 Thread via GitHub
andygrove opened a new pull request, #3299: URL: https://github.com/apache/datafusion-comet/pull/3299 ## Which issue does this PR close? Closes #. ## Rationale for this change Enable by default and see which Spark SQL tests fail. ## What changes are

Re: [PR] chore: Add microbenchmark for IcebergScan operator serde roundtrip [datafusion-comet]

2026-01-27 Thread via GitHub
mbutrovich commented on code in PR #3296: URL: https://github.com/apache/datafusion-comet/pull/3296#discussion_r2732593194 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometOperatorSerdeBenchmark.scala: ## @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] chore: Add microbenchmark for IcebergScan operator serde roundtrip [datafusion-comet]

2026-01-27 Thread via GitHub
mbutrovich commented on code in PR #3296: URL: https://github.com/apache/datafusion-comet/pull/3296#discussion_r2732581880 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometOperatorSerdeBenchmark.scala: ## @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] Add Parquet read pruning configuration for max elements in inList [datafusion]

2026-01-27 Thread via GitHub
16pierre commented on code in PR #19928: URL: https://github.com/apache/datafusion/pull/19928#discussion_r2732607568 ## datafusion/pruning/src/pruning_predicate.rs: ## @@ -461,7 +473,11 @@ impl PruningPredicate { /// returns a new expression. /// It is recommended that

Re: [PR] Add Parquet read pruning configuration for max elements in inList [datafusion]

2026-01-27 Thread via GitHub
16pierre commented on code in PR #19928: URL: https://github.com/apache/datafusion/pull/19928#discussion_r2732607568 ## datafusion/pruning/src/pruning_predicate.rs: ## @@ -461,7 +473,11 @@ impl PruningPredicate { /// returns a new expression. /// It is recommended that

Re: [I] Unifying Operator Handling with the Scalar Function Framework [datafusion]

2026-01-27 Thread via GitHub
Acfboy commented on issue #20018: URL: https://github.com/apache/datafusion/issues/20018#issuecomment-3805918194 > Interesting idea. I assume you mean to transform 2.0 << 3.5 to something like bit_shift_left(...) and 1 + 'a' to add(...) ? Yes, that's exactly what I mean. The big d

Re: [PR] Add Parquet read pruning configuration for max elements in inList [datafusion]

2026-01-27 Thread via GitHub
16pierre commented on code in PR #19928: URL: https://github.com/apache/datafusion/pull/19928#discussion_r2732619797 ## datafusion/pruning/src/pruning_predicate.rs: ## @@ -461,7 +473,11 @@ impl PruningPredicate { /// returns a new expression. /// It is recommended that

Re: [PR] perf: Iceberg serde ~50% faster serialization [datafusion-comet]

2026-01-27 Thread via GitHub
mbutrovich commented on PR #3298: URL: https://github.com/apache/datafusion-comet/pull/3298#issuecomment-3805933110 Don't we have an IcebergReflection helper? It seems like we should try to encapsulate this logic there. -- This is an automated message from the Apache Git Service. To resp

Re: [PR] chore(deps): bump org.assertj:assertj-core from 3.23.1 to 3.27.7 [datafusion-comet]

2026-01-27 Thread via GitHub
mbutrovich merged PR #3293: URL: https://github.com/apache/datafusion-comet/pull/3293 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] perf: [WIP] [iceberg] Per-partition FileScanTasks [datafusion-comet]

2026-01-27 Thread via GitHub
codecov-commenter commented on PR #3297: URL: https://github.com/apache/datafusion-comet/pull/3297#issuecomment-3805979784 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3297?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] Unify the Prettier versions [datafusion]

2026-01-27 Thread via GitHub
cj-zhukov opened a new issue, #20024: URL: https://github.com/apache/datafusion/issues/20024 ### Is your feature request related to a problem or challenge? As we discussed with @Jefffrey in https://github.com/apache/datafusion/pull/19750#discussion_r2726379801 , I want to unify prett

Re: [I] Unify the Prettier versions [datafusion]

2026-01-27 Thread via GitHub
cj-zhukov commented on issue #20024: URL: https://github.com/apache/datafusion/issues/20024#issuecomment-3804463715 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] DataFusion 52 release post [datafusion-site]

2026-01-27 Thread via GitHub
alamb commented on PR #135: URL: https://github.com/apache/datafusion-site/pull/135#issuecomment-3804476844 I plan to post this tomorrow, Wednesday, Jan 28 unless anyone would like more time to comment or contribute -- This is an automated message from the Apache Git Service. To respond t

Re: [I] Reuse file descriptors for LocalFileSystem object storage [datafusion]

2026-01-27 Thread via GitHub
AdamGS commented on issue #19983: URL: https://github.com/apache/datafusion/issues/19983#issuecomment-3804419944 glad to take this if you didn't get started -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[I] Indeterministic test failure in `SpillPool` [datafusion]

2026-01-27 Thread via GitHub
2010YOUY01 opened a new issue, #20027: URL: https://github.com/apache/datafusion/issues/20027 ### Describe the bug The failed test: https://github.com/apache/datafusion/blob/af771970b738f56425b50375cc03b6732a864282/datafusion/physical-plan/src/spill/spill_pool.rs#L414 It failed

Re: [PR] add more projection pushdown slt tests [datafusion]

2026-01-27 Thread via GitHub
adriangb commented on code in PR #20015: URL: https://github.com/apache/datafusion/pull/20015#discussion_r2731667683 ## datafusion/sqllogictest/test_files/projection_pushdown.slt: ## @@ -1038,7 +1084,295 @@ SELECT id, s['value'] + 100, s['label'] || '_test' FROM simple_struct O

Re: [PR] refactor: make PhysicalExprAdatperFactory::create fallible [datafusion]

2026-01-27 Thread via GitHub
adriangb commented on PR #20017: URL: https://github.com/apache/datafusion/pull/20017#issuecomment-3804797636 Btw I don't think an entry in `upgrading.md` is worth it for this small change. LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[PR] Update version to `52.1.0` (#19878) [datafusion]

2026-01-27 Thread via GitHub
alamb opened a new pull request, #20028: URL: https://github.com/apache/datafusion/pull/20028 ## Which issue does this PR close? - part of https://github.com/apache/datafusion/issues/19784 ## Rationale for this change Forward port changes from branch-52 to main ## Wha

Re: [PR] feat(spark): add unix date and timestamp functions [datafusion]

2026-01-27 Thread via GitHub
Jefffrey merged PR #19892: URL: https://github.com/apache/datafusion/pull/19892 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] [datafusion-spark] add `unix_date/micros/millis/seconds` functions [datafusion]

2026-01-27 Thread via GitHub
Jefffrey closed issue #19891: [datafusion-spark] add `unix_date/micros/millis/seconds` functions URL: https://github.com/apache/datafusion/issues/19891 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] feat(spark): add unix date and timestamp functions [datafusion]

2026-01-27 Thread via GitHub
Jefffrey commented on PR #19892: URL: https://github.com/apache/datafusion/pull/19892#issuecomment-3804905561 Thanks @cht42 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Unifying Operator Handling with the Scalar Function Framework [datafusion]

2026-01-27 Thread via GitHub
Omega359 commented on issue #20018: URL: https://github.com/apache/datafusion/issues/20018#issuecomment-3805348887 Interesting idea. I assume you mean to transform 2.0 << 3.5 to something like bit_shift_left(...) and 1 + 'a' to add(...) ? -- This is an automated message from the Apache Gi

Re: [PR] Improve documentation for ScalarUDFImpl::preimage [datafusion]

2026-01-27 Thread via GitHub
sdf-jkl commented on code in PR #20008: URL: https://github.com/apache/datafusion/pull/20008#discussion_r2732339346 ## datafusion/expr/src/udf.rs: ## @@ -709,20 +709,49 @@ pub trait ScalarUDFImpl: Debug + DynEq + DynHash + Send + Sync { Ok(ExprSimplifyResult::Original(

Re: [PR] add more projection pushdown slt tests [datafusion]

2026-01-27 Thread via GitHub
adriangb commented on code in PR #20015: URL: https://github.com/apache/datafusion/pull/20015#discussion_r2732377518 ## datafusion/sqllogictest/test_files/projection_pushdown.slt: ## @@ -361,13 +361,58 @@ SELECT id, s['value'] FROM simple_struct ORDER BY s['value']; 5 250 4 3

Re: [PR] Rewrite physical expressions in execution plans [datafusion]

2026-01-27 Thread via GitHub
LLDay commented on code in PR #20009: URL: https://github.com/apache/datafusion/pull/20009#discussion_r2732380338 ## datafusion/physical-plan/src/sorts/sort_preserving_merge.rs: ## @@ -409,6 +410,46 @@ impl ExecutionPlan for SortPreservingMergeExec { .with_fetch(sel

Re: [PR] Various performance improvements [datafusion]

2026-01-27 Thread via GitHub
Dandandan commented on PR #20013: URL: https://github.com/apache/datafusion/pull/20013#issuecomment-3805622687 show benchmark queue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Various performance improvements [datafusion]

2026-01-27 Thread via GitHub
alamb-ghbot commented on PR #20013: URL: https://github.com/apache/datafusion/pull/20013#issuecomment-3805623103 🤖 Hi @Dandandan, you asked to view the benchmark queue (https://github.com/apache/datafusion/pull/20013#issuecomment-3805622687). | Job | User | Benchmarks | Comment | |

Re: [PR] Blog post about CASE optimization [datafusion-site]

2026-01-27 Thread via GitHub
pepijnve commented on code in PR #122: URL: https://github.com/apache/datafusion-site/pull/122#discussion_r2732441232 ## content/blog/2026-01-26-datafusion_case.md: ## @@ -0,0 +1,468 @@ +--- +layout: post +title: Optimizing SQL CASE Expression Evaluation +date: 2026-01-26 +autho

Re: [PR] Blog post about CASE optimization [datafusion-site]

2026-01-27 Thread via GitHub
pepijnve commented on code in PR #122: URL: https://github.com/apache/datafusion-site/pull/122#discussion_r2732442032 ## content/blog/2026-01-26-datafusion_case.md: ## @@ -0,0 +1,468 @@ +--- +layout: post +title: Optimizing SQL CASE Expression Evaluation +date: 2026-01-26 +autho

Re: [I] Add limit to `DefaultFileStatisticsCache` [datafusion]

2026-01-27 Thread via GitHub
abhita commented on issue #19052: URL: https://github.com/apache/datafusion/issues/19052#issuecomment-3805711175 @mkleen @alamb As the plan here is to enforce limit on `DefaultFileStatisticsCache` , are we also planning on any mechanisms to handle stale entries here? perhaps through evi

Re: [PR] Blog post about CASE optimization [datafusion-site]

2026-01-27 Thread via GitHub
pepijnve commented on PR #122: URL: https://github.com/apache/datafusion-site/pull/122#issuecomment-3805722303 @alamb all comments processed. TYVM for the editorial work. After staring at the same paragraph for too long I start to miss these kinds of details. -- This is an automated messa

Re: [PR] Rewrite physical expressions in execution plans [datafusion]

2026-01-27 Thread via GitHub
LLDay commented on code in PR #20009: URL: https://github.com/apache/datafusion/pull/20009#discussion_r2732251301 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -1152,6 +1194,57 @@ pub fn check_default_invariants( Ok(()) } +/// Verifies that the [`ExecutionPlan

[PR] perf: [WIP] [iceberg] Per partition FileScanTasks [datafusion-comet]

2026-01-27 Thread via GitHub
mbutrovich opened a new pull request, #3297: URL: https://github.com/apache/datafusion-comet/pull/3297 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these change

Re: [PR] Rewrite physical expressions in execution plans [datafusion]

2026-01-27 Thread via GitHub
LLDay commented on PR #20009: URL: https://github.com/apache/datafusion/pull/20009#issuecomment-3805689393 We decided to leave only one commit in this PR regarding the rewriting of physical expressions. The second commit with `ResolvePhysicalExpr` will be in a separate PR. You can find what

Re: [PR] perf: Iceberg serde ~50% faster serialization [datafusion-comet]

2026-01-27 Thread via GitHub
codecov-commenter commented on PR #3298: URL: https://github.com/apache/datafusion-comet/pull/3298#issuecomment-3805990474 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3298?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] Nesting async UDF calls causes an internal error [datafusion]

2026-01-27 Thread via GitHub
andreashgk opened a new issue, #20031: URL: https://github.com/apache/datafusion/issues/20031 ### Describe the bug Directly calling an async UDF on the output of another async UDF `async_example(async_example(1))` produces an internal error.: ``` Internal error: async functions

Re: [PR] fix: correct weight handling in approx_percentile_cont_with_weight [datafusion]

2026-01-27 Thread via GitHub
sesteves commented on code in PR #19941: URL: https://github.com/apache/datafusion/pull/19941#discussion_r2732736969 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -2029,11 +2029,12 @@ statement ok INSERT INTO t1 VALUES (TRUE); # ISSUE: https://github.com/apache/

[PR] perf: Optimize scalar path for ltrim function [datafusion]

2026-01-27 Thread via GitHub
kumarUjjawal opened a new pull request, #20032: URL: https://github.com/apache/datafusion/pull/20032 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/2984 #. ## Rationale for this change `ltrim` currently routes scalar input

Re: [PR] feat: Enable native columnar-to-row by default [WIP] [datafusion-comet]

2026-01-27 Thread via GitHub
codecov-commenter commented on PR #3299: URL: https://github.com/apache/datafusion-comet/pull/3299#issuecomment-3806070245 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3299?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] perf: Iceberg serde ~50% faster serialization [datafusion-comet]

2026-01-27 Thread via GitHub
andygrove commented on PR #3298: URL: https://github.com/apache/datafusion-comet/pull/3298#issuecomment-3806093578 > Don't we have an IcebergReflection helper? It seems like we should try to encapsulate this logic there. Thanks. I pushed another commit to do this. -- This is an au

Re: [PR] perf: [WIP] [iceberg] Per-partition FileScanTasks [datafusion-comet]

2026-01-27 Thread via GitHub
mbutrovich closed pull request #3297: perf: [WIP] [iceberg] Per-partition FileScanTasks URL: https://github.com/apache/datafusion-comet/pull/3297 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] perf: Parallelize list_files_for_scan using tokio::task::JoinSet [datafusion]

2026-01-27 Thread via GitHub
Tushar7012 commented on PR #20023: URL: https://github.com/apache/datafusion/pull/20023#issuecomment-3804411253 All CI checks passing Regarding the Copilot review comment about memory trade-off: I've added documentation in the code explaining this intentional design decision. The par

Re: [PR] Various performance improvements [datafusion]

2026-01-27 Thread via GitHub
Dandandan commented on PR #20013: URL: https://github.com/apache/datafusion/pull/20013#issuecomment-3804774176 show benchmark queue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Various performance improvements [datafusion]

2026-01-27 Thread via GitHub
alamb-ghbot commented on PR #20013: URL: https://github.com/apache/datafusion/pull/20013#issuecomment-3804774326 🤖 Hi @Dandandan, you asked to view the benchmark queue (https://github.com/apache/datafusion/pull/20013#issuecomment-3804774176). | Job | User | Benchmarks | Comment | |

Re: [I] Unifying Operator Handling with the Scalar Function Framework [datafusion]

2026-01-27 Thread via GitHub
Acfboy commented on issue #20018: URL: https://github.com/apache/datafusion/issues/20018#issuecomment-3804947631 Thank you for the feedback @2010YOUY01 ! I'd love to try refactoring one operator first to see how it works with the ScalarUDFImpl framework. I’m thinking of starting with

[PR] Add `rust-required-checks` [datafusion]

2026-01-27 Thread via GitHub
blaginin opened a new pull request, #20029: URL: https://github.com/apache/datafusion/pull/20029 ## Which issue does this PR close? Related to https://github.com/apache/datafusion/issues/6880 ## Rationale for this change When we were testing merge queues for this repo, I

Re: [PR] add more projection pushdown slt tests [datafusion]

2026-01-27 Thread via GitHub
alamb commented on code in PR #20015: URL: https://github.com/apache/datafusion/pull/20015#discussion_r2731983401 ## datafusion/sqllogictest/test_files/projection_pushdown.slt: ## @@ -361,13 +361,58 @@ SELECT id, s['value'] FROM simple_struct ORDER BY s['value']; 5 250 4 300

Re: [I] Unifying Operator Handling with the Scalar Function Framework [datafusion]

2026-01-27 Thread via GitHub
2010YOUY01 commented on issue #20018: URL: https://github.com/apache/datafusion/issues/20018#issuecomment-3804518231 The high level idea I think is perfect, unifying them can make a lot of implementation simpler. However I'm not sure how hard the implementation would be, I suggest we

[PR] minor: Move metric `page_index_rows_pruned` to verbose level in `EXPLAIN ANALYZE` [datafusion]

2026-01-27 Thread via GitHub
2010YOUY01 opened a new pull request, #20026: URL: https://github.com/apache/datafusion/pull/20026 ## Which issue does this PR close? - Closes #. ## Rationale for this change There are two similar parquet page pruning metrics: 1. page_index_pages_pruned

Re: [I] Explore replacing ad-hoc parsing logic in datafusion-examples with a nom-based parser [datafusion]

2026-01-27 Thread via GitHub
cj-zhukov commented on issue #20025: URL: https://github.com/apache/datafusion/issues/20025#issuecomment-3804519750 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[I] Explore replacing ad-hoc parsing logic in datafusion-examples with a nom-based parser [datafusion]

2026-01-27 Thread via GitHub
cj-zhukov opened a new issue, #20025: URL: https://github.com/apache/datafusion/issues/20025 ### Is your feature request related to a problem or challenge? As we discussed with @Jefffrey in https://github.com/apache/datafusion/pull/19750#issuecomment-3803158136 , I’d like to explore

Re: [PR] add more projection pushdown slt tests [datafusion]

2026-01-27 Thread via GitHub
adriangb commented on code in PR #20015: URL: https://github.com/apache/datafusion/pull/20015#discussion_r2731628359 ## datafusion/sqllogictest/test_files/projection_pushdown.slt: ## @@ -1038,7 +1084,295 @@ SELECT id, s['value'] + 100, s['label'] || '_test' FROM simple_struct O

[PR] chore(deps): bump taiki-e/install-action from 2.66.7 to 2.67.13 [datafusion-sandbox]

2026-01-27 Thread via GitHub
dependabot[bot] opened a new pull request, #146: URL: https://github.com/apache/datafusion-sandbox/pull/146 Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.66.7 to 2.67.13. Release notes Sourced from https://github.com/taiki-e/install-action/release

Re: [PR] chore(deps): bump taiki-e/install-action from 2.66.7 to 2.67.11 [datafusion-sandbox]

2026-01-27 Thread via GitHub
dependabot[bot] commented on PR #145: URL: https://github.com/apache/datafusion-sandbox/pull/145#issuecomment-3804671240 Superseded by #146. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] chore(deps): bump taiki-e/install-action from 2.66.7 to 2.67.11 [datafusion-sandbox]

2026-01-27 Thread via GitHub
dependabot[bot] closed pull request #145: chore(deps): bump taiki-e/install-action from 2.66.7 to 2.67.11 URL: https://github.com/apache/datafusion-sandbox/pull/145 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] chore(deps): bump taiki-e/install-action from 2.66.7 to 2.67.13 [datafusion-sandbox]

2026-01-27 Thread via GitHub
dependabot[bot] commented on PR #146: URL: https://github.com/apache/datafusion-sandbox/pull/146#issuecomment-3804671099 ### Labels The following labels could not be found: `auto-dependencies`. Please create it before Dependabot can add it to a pull request. Please fix the

Re: [PR] Blog post about CASE optimization [datafusion-site]

2026-01-27 Thread via GitHub
alamb commented on code in PR #122: URL: https://github.com/apache/datafusion-site/pull/122#discussion_r2731489110 ## content/blog/2026-01-26-datafusion_case.md: ## @@ -0,0 +1,468 @@ +--- +layout: post +title: Optimizing SQL CASE Expression Evaluation +date: 2026-01-26 +author:

Re: [PR] feat(spark): Adds negative spark function [datafusion]

2026-01-27 Thread via GitHub
SubhamSinghal commented on code in PR #20006: URL: https://github.com/apache/datafusion/pull/20006#discussion_r2732043789 ## datafusion/spark/src/function/math/negative.rs: ## @@ -0,0 +1,410 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

Re: [PR] feat: optimise copying in `left` for Utf8 and LargeUtf8 [datafusion]

2026-01-27 Thread via GitHub
theirix commented on PR #19980: URL: https://github.com/apache/datafusion/pull/19980#issuecomment-3805253664 > I think it would be a good idea to enhance the benchmark to add Utf8View case as well; we can simply compare it to Utf8 numbers as the before case. Good idea, I'll add it sho

Re: [PR] feat: Add spark-compat mode to integrate datafusion-spark features au… [datafusion-ballista]

2026-01-27 Thread via GitHub
milenkovicm commented on PR #1416: URL: https://github.com/apache/datafusion-ballista/pull/1416#issuecomment-3805243066 > I'm not certain that we have a well-defined place to document compile-time features, just docs for runtime configurations. I'd like to follow this up with a markdown fi

Re: [PR] Add Parquet read pruning configuration for max elements in inList [datafusion]

2026-01-27 Thread via GitHub
adriangb commented on code in PR #19928: URL: https://github.com/apache/datafusion/pull/19928#discussion_r2731532157 ## datafusion/pruning/src/pruning_predicate.rs: ## @@ -461,7 +473,11 @@ impl PruningPredicate { /// returns a new expression. /// It is recommended that

[D] DISCUSSION: Apache DataFusion New York Meetup April / May 2026 [datafusion]

2026-01-27 Thread via GitHub
GitHub user adriangb created a discussion: DISCUSSION: Apache DataFusion New York Meetup April / May 2026 Last NYC meetup (#10343) was a great success, I think ~6 months later is a great time to do another one! @gene-bordegaray and I will be organizing, we'll host at DataDog again. Tentative

[PR] Add microbenchmark for IcebergScan operator serde roundtrip [datafusion-comet]

2026-01-27 Thread via GitHub
andygrove opened a new pull request, #3296: URL: https://github.com/apache/datafusion-comet/pull/3296 ## Summary This PR adds a microbenchmark for measuring the serialization/deserialization performance of Iceberg `FileScanTask` objects to protobuf. The benchmark: - Creates

Re: [PR] Rewrite physical expressions in execution plans [datafusion]

2026-01-27 Thread via GitHub
LLDay commented on code in PR #20009: URL: https://github.com/apache/datafusion/pull/20009#discussion_r2732231656 ## datafusion/physical-plan/src/resolve_placeholders.rs: ## @@ -0,0 +1,327 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] feat(spark): Adds negative spark function [datafusion]

2026-01-27 Thread via GitHub
SubhamSinghal commented on code in PR #20006: URL: https://github.com/apache/datafusion/pull/20006#discussion_r2732231606 ## datafusion/spark/src/function/math/negative.rs: ## @@ -0,0 +1,410 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

Re: [PR] add more projection pushdown slt tests [datafusion]

2026-01-27 Thread via GitHub
adriangb merged PR #20015: URL: https://github.com/apache/datafusion/pull/20015 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: Extract NDV (distinct_count) statistics from Parquet metadata [datafusion]

2026-01-27 Thread via GitHub
gabotechs commented on code in PR #19957: URL: https://github.com/apache/datafusion/pull/19957#discussion_r2732799159 ## datafusion/physical-expr/src/projection.rs: ## @@ -660,9 +660,25 @@ impl ProjectionExprs { } } } else { -

Re: [PR] feat: Add Semi/Anti join to PiecewiseMergeJoin [datafusion]

2026-01-27 Thread via GitHub
comphead commented on PR #18392: URL: https://github.com/apache/datafusion/pull/18392#issuecomment-3806155369 run benchmarks tpcds -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] feat: Add Semi/Anti join to PiecewiseMergeJoin [datafusion]

2026-01-27 Thread via GitHub
alamb-ghbot commented on PR #18392: URL: https://github.com/apache/datafusion/pull/18392#issuecomment-380618 🤖 Hi @comphead, thanks for the request (https://github.com/apache/datafusion/pull/18392#issuecomment-3806155369). [`scrape_comments.py`](https://github.com/alamb/datafusio

[I] Support ANSI mode for `negate` function [datafusion]

2026-01-27 Thread via GitHub
comphead opened a new issue, #20034: URL: https://github.com/apache/datafusion/issues/20034 ### Is your feature request related to a problem or challenge? https://github.com/apache/datafusion/pull/20006 now supports `negate` function for Spark, however it should also consider ANSI mod

Re: [PR] Allow struct field access projections to be pushed down into scans [datafusion]

2026-01-27 Thread via GitHub
adriangb commented on PR #19538: URL: https://github.com/apache/datafusion/pull/19538#issuecomment-3806270548 @alamb I've renamed the terminology to: ```rust pub enum ExpressionPlacement { /// Argument is a literal constant value or an expression that can be /// evalua

[I] Spark SQL test failures when native columnar-to-row is enabled by default [datafusion-comet]

2026-01-27 Thread via GitHub
andygrove opened a new issue, #3300: URL: https://github.com/apache/datafusion-comet/issues/3300 ## Summary When enabling native columnar-to-row conversion by default (PR #3299), several Spark SQL tests fail across all supported Spark versions (3.4, 3.5, 4.0). The failures fall into

Re: [PR] feat: implement protobuf converter trait to allow control over serialization and deserialization processes [datafusion]

2026-01-27 Thread via GitHub
adriangb commented on PR #19437: URL: https://github.com/apache/datafusion/pull/19437#issuecomment-3806467978 I plan to merge this once CI passes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Disallow positional struct casting when field names don’t overlap [datafusion]

2026-01-27 Thread via GitHub
adriangb commented on PR #19955: URL: https://github.com/apache/datafusion/pull/19955#issuecomment-3806473837 I plan to merge this tomorrow morning EST if there is no further feedback -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [I] Make PhysicalExprAdapterFactory::create fallible [datafusion]

2026-01-27 Thread via GitHub
adriangb closed issue #19956: Make PhysicalExprAdapterFactory::create fallible URL: https://github.com/apache/datafusion/issues/19956 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

  1   2   3   >