Re: [I] [EPIC] Migrate to functions in `datafusion-spark` crate [datafusion-comet]

2026-02-25 Thread via GitHub
coderfender commented on issue #2084: URL: https://github.com/apache/datafusion-comet/issues/2084#issuecomment-3961741909 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] fix: Avoid unnecessary type casts in `concat_ws` [datafusion]

2026-02-25 Thread via GitHub
neilconway commented on code in PR #20436: URL: https://github.com/apache/datafusion/pull/20436#discussion_r2855212342 ## datafusion/functions/src/string/concat.rs: ## @@ -88,37 +88,33 @@ impl ScalarUDFImpl for ConcatFunc { &self.signature } +/// Match the re

Re: [PR] perf: Use Arrow vectorized eq kernel for IN list with column references [datafusion]

2026-02-25 Thread via GitHub
neilconway commented on code in PR #20528: URL: https://github.com/apache/datafusion/pull/20528#discussion_r2855227621 ## datafusion/physical-expr/src/expressions/in_list.rs: ## @@ -771,32 +772,43 @@ impl PhysicalExpr for InListExpr { } }

Re: [PR] perf: Use Arrow vectorized eq kernel for IN list with column references [datafusion]

2026-02-25 Thread via GitHub
neilconway commented on code in PR #20528: URL: https://github.com/apache/datafusion/pull/20528#discussion_r2855227621 ## datafusion/physical-expr/src/expressions/in_list.rs: ## @@ -771,32 +772,43 @@ impl PhysicalExpr for InListExpr { } }

Re: [PR] (Test) Advanced adaptive filter selectivity evaluation [datafusion]

2026-02-25 Thread via GitHub
alamb-ghbot commented on PR #20363: URL: https://github.com/apache/datafusion/pull/20363#issuecomment-3961020612 🤖: Benchmark completed Details ``` Comparing HEAD and filter-pushdown-dynamic-bytes Benchmark clickbench_partitioned.json ---

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3961020987 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] perf: Use Hashbrown for array_distinct [datafusion]

2026-02-25 Thread via GitHub
mbutrovich merged PR #20538: URL: https://github.com/apache/datafusion/pull/20538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

[PR] perf: Optimize heap handling in TopK operator [datafusion]

2026-02-25 Thread via GitHub
AdamGS opened a new pull request, #20556: URL: https://github.com/apache/datafusion/pull/20556 ## Which issue does this PR close? - Closes #. ## Rationale for this change This change to make a significant performance impact in the `TopK` operator, which is a comm

Re: [PR] perf: Optimize heap handling in TopK operator [datafusion]

2026-02-25 Thread via GitHub
AdamGS commented on code in PR #20556: URL: https://github.com/apache/datafusion/pull/20556#discussion_r2854618773 ## datafusion/sqllogictest/test_files/window.slt: ## @@ -4387,9 +4387,9 @@ LIMIT 5; 78 50 63 38 -3 53 +NULL 19 Review Comment: verified here, c2 is `1.1

Re: [PR] build: fix runs-on tags for consistency [datafusion-comet]

2026-02-25 Thread via GitHub
andygrove merged PR #3601: URL: https://github.com/apache/datafusion-comet/pull/3601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Extend dynamic filter pushdown to Left and LeftSemi hash joins [datafusion]

2026-02-25 Thread via GitHub
nuno-faria commented on code in PR #20447: URL: https://github.com/apache/datafusion/pull/20447#discussion_r2855318822 ## datafusion/physical-plan/src/joins/hash_join/exec.rs: ## @@ -738,8 +738,10 @@ impl HashJoinExec { } fn allow_join_dynamic_filter_pushdown(&self,

Re: [PR] Extend dynamic filter pushdown to Left and LeftSemi hash joins [datafusion]

2026-02-25 Thread via GitHub
nuno-faria commented on PR #20447: URL: https://github.com/apache/datafusion/pull/20447#issuecomment-3961905449 @adriangb please take a look as well if you have the chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3961162297 🤖: Benchmark completed Details ``` Comparing HEAD and parquet-morsel-driven-execution-237164415184908839 Benchmark clickbench_ex

Re: [PR] perf: Optimize heap handling in TopK operator [datafusion]

2026-02-25 Thread via GitHub
AdamGS commented on PR #20556: URL: https://github.com/apache/datafusion/pull/20556#issuecomment-3961167510 If there are any more benchmarks that make sense here, happy to run them, a quick search didn't turn up anything else that looked relevant -- This is an automated message from the A

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
Dandandan commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3961193872 run benchmark clickbench_extended DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true -- This is an automated me

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3961194581 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] deps: DataFusion 52.0.0 migration (SchemaAdapter changes, etc.) [iceberg] [datafusion-comet]

2026-02-25 Thread via GitHub
comphead commented on code in PR #3536: URL: https://github.com/apache/datafusion-comet/pull/3536#discussion_r2854686142 ## native/core/src/execution/operators/iceberg_scan.rs: ## @@ -313,3 +316,48 @@ impl DisplayAs for IcebergScanExec { ) } } + +/// Build project

Re: [PR] deps: DataFusion 52.0.0 migration (SchemaAdapter changes, etc.) [iceberg] [datafusion-comet]

2026-02-25 Thread via GitHub
comphead commented on code in PR #3536: URL: https://github.com/apache/datafusion-comet/pull/3536#discussion_r2854691199 ## native/core/src/execution/operators/iceberg_scan.rs: ## @@ -251,38 +258,34 @@ where Poll::Ready(Some(Ok(batch))) => { let fil

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3961230145 🤖: Benchmark completed Details ``` Comparing HEAD and parquet-morsel-driven-execution-237164415184908839 Benchmark clickbench_ex

[I] feat: Investigate caching opportunities in `iceberg_scan.rs` [datafusion-comet]

2026-02-25 Thread via GitHub
comphead opened a new issue, #3602: URL: https://github.com/apache/datafusion-comet/issues/3602 This might be a future optimization if I'm understanding control flow correctly, but for default value/schema change columns we're allocating new `ArrayRef` every time. It seems like there are ca

Re: [PR] deps: DataFusion 52.0.0 migration (SchemaAdapter changes, etc.) [iceberg] [datafusion-comet]

2026-02-25 Thread via GitHub
comphead commented on code in PR #3536: URL: https://github.com/apache/datafusion-comet/pull/3536#discussion_r2854726641 ## native/core/src/execution/operators/iceberg_scan.rs: ## @@ -313,3 +316,48 @@ impl DisplayAs for IcebergScanExec { ) } } + +/// Build project

[PR] feat: Add FFI_TableProviderFactory support [datafusion-python]

2026-02-25 Thread via GitHub
davisp opened a new pull request, #1396: URL: https://github.com/apache/datafusion-python/pull/1396 # Which issue does this PR close? Closes #1393 # Rationale for this change This PR wraps the new FFI_TableProviderFactory to support custom `CREATE EXTERNAL TABLE` state

Re: [PR] perf: Optimize `array_to_string()`, support decimal arrays [datafusion]

2026-02-25 Thread via GitHub
neilconway commented on PR #20553: URL: https://github.com/apache/datafusion/pull/20553#issuecomment-3961313349 Updated benchmarks; I added special-cases for integers and floats, and also added a benchmark for floats: ``` ┌──┬┬─┬───┬───

Re: [PR] perf: Optimize `array_to_string()`, support decimal arrays [datafusion]

2026-02-25 Thread via GitHub
neilconway commented on PR #20553: URL: https://github.com/apache/datafusion/pull/20553#issuecomment-3961338927 I think adding dependencies on `itoa` and `ryu` is justified, because the incremental speedup is significant (e.g., 1009 -> 614 us for int64/100). We could potentially use these c

Re: [I] Add utility crate [datafusion-python]

2026-02-25 Thread via GitHub
kevinjqliu commented on issue #1395: URL: https://github.com/apache/datafusion-python/issues/1395#issuecomment-3961328526 Great idea! I was wondering why these functions were not available for import For iceberg-rust, when upgrading to datafusion 52.x, we copied over the definition

Re: [PR] perf: Use Hashbrown for array_distinct [datafusion]

2026-02-25 Thread via GitHub
neilconway commented on PR #20538: URL: https://github.com/apache/datafusion/pull/20538#issuecomment-3961843714 > Thanks @neilconway it is amazing how 1 line can change so much Indeed; the choice of hash function in `std` seems a bit silly, at least for our use-case. #19869 might be w

Re: [PR] perf: Optimize `array_to_string()`, support decimal arrays [datafusion]

2026-02-25 Thread via GitHub
neilconway commented on PR #20553: URL: https://github.com/apache/datafusion/pull/20553#issuecomment-3961617195 So it turns out that when using `ryu`, the output differs in a few cosmetic ways from how Rust's default formatting displays floats. For example, ryu returns "1.0" for the floatin

Re: [PR] perf: Improve performance of native row-to-columnar transition used by JVM shuffle [datafusion-comet]

2026-02-25 Thread via GitHub
comphead commented on code in PR #3289: URL: https://github.com/apache/datafusion-comet/pull/3289#discussion_r2855082842 ## native/core/src/execution/shuffle/spark_unsafe/row.rs: ## @@ -441,7 +492,662 @@ pub(super) fn append_field( Ok(()) } +/// Appends nested struct fie

Re: [PR] Disallow order by within ordered-set aggregate functions argument lists [datafusion]

2026-02-25 Thread via GitHub
alamb commented on code in PR #20421: URL: https://github.com/apache/datafusion/pull/20421#discussion_r2854879367 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -129,6 +129,14 @@ CREATE TABLE group_median_table_nullable ( # Error tests ### +statement error OR

Re: [PR] [fix] Add type coercion from NULL to Interval to make date_bin more postgres compatible [datafusion]

2026-02-25 Thread via GitHub
gabotechs merged PR #20499: URL: https://github.com/apache/datafusion/pull/20499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] date_bin function returning planning error [datafusion]

2026-02-25 Thread via GitHub
gabotechs closed issue #20502: date_bin function returning planning error URL: https://github.com/apache/datafusion/issues/20502 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [fix] Add type coercion from NULL to Interval to make date_bin more postgres compatible [datafusion]

2026-02-25 Thread via GitHub
gabotechs commented on PR #20499: URL: https://github.com/apache/datafusion/pull/20499#issuecomment-3957498614 Thanks @LiaCastaneda for the PR, and @neilconway for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [fix] Add type coercion from NULL to Interval to make date_bin more postgres compatible [datafusion]

2026-02-25 Thread via GitHub
LiaCastaneda commented on PR #20499: URL: https://github.com/apache/datafusion/pull/20499#issuecomment-3957502371 Thanks both for reviewing 🙇‍♀️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[I] Native Sort-Merge Writer for Iceberg ClusteredWriter Path [datafusion-comet]

2026-02-25 Thread via GitHub
Shekharrajak opened a new issue, #3595: URL: https://github.com/apache/datafusion-comet/issues/3595 ### What is the problem the feature request solves? Implement a fused sort-merge writer in native Rust for Comet's Iceberg write path, eliminating JNI round-trips when writing partitio

Re: [I] [EPIC] Improve Comet Native writer [datafusion-comet]

2026-02-25 Thread via GitHub
Shekharrajak commented on issue #2967: URL: https://github.com/apache/datafusion-comet/issues/2967#issuecomment-3957753150 We can consider this feature to be available in native writer https://github.com/apache/datafusion-comet/issues/3595 - please have a look. -- This is an automated m

Re: [PR] Check sqllogictests for any dangling config settings(#17914) [datafusion]

2026-02-25 Thread via GitHub
Jefffrey commented on PR #20474: URL: https://github.com/apache/datafusion/pull/20474#issuecomment-3957753864 That sounds good -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] chore(deps): bump rustls from 0.23.36 to 0.23.37 [datafusion-ballista]

2026-02-25 Thread via GitHub
milenkovicm merged PR #1474: URL: https://github.com/apache/datafusion-ballista/pull/1474 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] proto: serialize and dedupe dynamic filters [datafusion]

2026-02-25 Thread via GitHub
LiaCastaneda commented on code in PR #20416: URL: https://github.com/apache/datafusion/pull/20416#discussion_r2851892219 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -327,6 +468,14 @@ impl DynamicFilterPhysicalExpr { Arc::strong_count(self) > 1

[PR] Add preimage optimization for `ceil` to rewrite predicates into range filters [datafusion]

2026-02-25 Thread via GitHub
kosiew opened a new pull request, #20541: URL: https://github.com/apache/datafusion/pull/20541 ## Which issue does this PR close? * This implements `ceil` part of #20197. --- ## Rationale for this change DataFusion’s preimage framework can turn predicates on determ

[I] `ExecuteQuery` to support push based job status notifications [datafusion-ballista]

2026-02-25 Thread via GitHub
milenkovicm opened a new issue, #1475: URL: https://github.com/apache/datafusion-ballista/issues/1475 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** At the moment, `ExecuteQuery` submits job, then it goes and pulls data from

Re: [I] `ExecuteQuery` to support push based job status notifications [datafusion-ballista]

2026-02-25 Thread via GitHub
milenkovicm commented on issue #1475: URL: https://github.com/apache/datafusion-ballista/issues/1475#issuecomment-3958015762 @danielhumanmod would you be interested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[PR] fix: handle empty delimiter in split_part (closes #20503) [datafusion]

2026-02-25 Thread via GitHub
gferrate opened a new pull request, #20542: URL: https://github.com/apache/datafusion/pull/20542 ## Which issue does this PR close? - Closes #20503 ## Rationale for this change `split_part` did not handle empty delimiters in a PostgreSQL-compatible way (`split("")` in R

Re: [PR] fix: handle empty delimiter in split_part (closes #20503) [datafusion]

2026-02-25 Thread via GitHub
LiaCastaneda commented on code in PR #20542: URL: https://github.com/apache/datafusion/pull/20542#discussion_r2852051394 ## datafusion/functions/src/string/split_part.rs: ## @@ -341,6 +356,117 @@ mod tests { Utf8, StringArray ); +// Edg

[I] Arrow Flight Shuffle for Comet [datafusion-comet]

2026-02-25 Thread via GitHub
Shekharrajak opened a new issue, #3596: URL: https://github.com/apache/datafusion-comet/issues/3596 ### What is the problem the feature request solves? Replace disk-based shuffle with Arrow Flight for direct memory-to-memory data exchange between executors, eliminating intermedia

Re: [I] Review uses of `"{b:02x}"` to hex-encode bytes [datafusion]

2026-02-25 Thread via GitHub
sejalak commented on issue #19569: URL: https://github.com/apache/datafusion/issues/19569#issuecomment-3958289736 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[PR] Add Null Type Coercions for Placeholders [datafusion]

2026-02-25 Thread via GitHub
cetra3 opened a new pull request, #20543: URL: https://github.com/apache/datafusion/pull/20543 ## Which issue does this PR close? There might be an active issue, I will have a look ## Rationale for this change This fixes a problem we have where placeholder types are `Null

Re: [PR] fix: use try_shrink instead of shrink in try_resize [datafusion]

2026-02-25 Thread via GitHub
ariel-miculas commented on PR #20424: URL: https://github.com/apache/datafusion/pull/20424#issuecomment-3958327844 I don't have any example of when this can happen, it's just that shrink `Panics if capacity exceeds [Self::size]`, whereas `try_shrink` returns an error -- This is an automat

[I] HiveQL, CTEs and FROM first SELECT statements [datafusion-sqlparser-rs]

2026-02-25 Thread via GitHub
Viicos opened a new issue, #2236: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2236 https://github.com/apache/datafusion-sqlparser-rs/pull/235 introduced support for HiveQL, and modified how CTEs are parsed to [parse an additional `FROM` keyword](https://github.com/apache/

[I] `PostgreSqlDialect` accepts large amounts of non-PostgreSQL syntax [datafusion-sqlparser-rs]

2026-02-25 Thread via GitHub
LucaCappelletti94 opened a new issue, #2237: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2237 While building a parser correctness benchmark using libpg_query (`pg_query.rs`) as the PostgreSQL ground truth, we measured how often `PostgreSqlDialect` accepts SQL that real Pos

Re: [I] Incorrect cast of integer columns to utf8 when comparing with utf8 constant [datafusion]

2026-02-25 Thread via GitHub
cetra3 commented on issue #15161: URL: https://github.com/apache/datafusion/issues/15161#issuecomment-3958341953 Here's an example of where `Unknown` might be preferable: https://github.com/apache/datafusion/pull/20543 -- This is an automated message from the Apache Git Service. To respon

[PR] Add support for INTERVAL keyword as unquoted identifier in PostgreSQL [datafusion-sqlparser-rs]

2026-02-25 Thread via GitHub
LucaCappelletti94 opened a new pull request, #2238: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2238 This PR fixes a PostgreSQL dialect parsing bug where `INTERVAL` was always treated as a reserved identifier keyword, causing valid queries such as: ```sql SELECT MAX

[PR] Support `${placeholder}` syntax in tokenizer [datafusion-sqlparser-rs]

2026-02-25 Thread via GitHub
cetra3 opened a new pull request, #2239: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2239 Add support for dollar-brace placeholders (`${name}`, `${1}`, etc.) in the tokenizer, storing the full `${...}` string in the existing `Token::Placeholder` / `Value::Placeholder` varian

[I] INSERT placeholder type inference corrupted when VALUES contains function-wrapped placeholders [datafusion]

2026-02-25 Thread via GitHub
killme2008 opened a new issue, #20544: URL: https://github.com/apache/datafusion/issues/20544 ### Describe the bug When an `INSERT INTO ... VALUES` statement contains a mix of bare placeholders (`$1`, `$3`) and function-wrapped placeholders (`func($2)`), the inferred types for placeh

Re: [PR] proto: serialize and dedupe dynamic filters [datafusion]

2026-02-25 Thread via GitHub
adriangb commented on code in PR #20416: URL: https://github.com/apache/datafusion/pull/20416#discussion_r2843240928 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -88,6 +88,69 @@ struct Inner { is_complete: bool, } +/// An atomic snapshot of a [`D

Re: [PR] fix: grouping separator for float and decimal [datafusion]

2026-02-25 Thread via GitHub
kosiew commented on code in PR #20268: URL: https://github.com/apache/datafusion/pull/20268#discussion_r2852324601 ## datafusion/spark/src/function/string/format_string.rs: ## @@ -2082,6 +2085,10 @@ impl ConversionSpecifier { } }; +if self.groupin

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-25 Thread via GitHub
gabotechs commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2851555892 ## datafusion/common/src/config.rs: ## @@ -996,6 +996,39 @@ config_namespace! { /// /// Note: This may reduce parallelism, rooting from the I/O

[PR] Support two-argument `TRIM(string, characters)` in PostgreSQL [datafusion-sqlparser-rs]

2026-02-25 Thread via GitHub
LucaCappelletti94 opened a new pull request, #2240: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2240 PostgreSQL supports `TRIM(string, characters)` as a function form, but `sqlparser-rs` rejected it under `PostgreSqlDialect` with: `ParserError("Expected: ), found: ,")`.

[PR] Only parse FROM identifier in CTE if using Hive [datafusion-sqlparser-rs]

2026-02-25 Thread via GitHub
Viicos opened a new pull request, #2241: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2241 Fixes https://github.com/apache/datafusion-sqlparser-rs/issues/2236. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-25 Thread via GitHub
adriangb commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2852423501 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -86,6 +108,11 @@ struct Inner { /// This is redundant with the watch channel state, but

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-25 Thread via GitHub
gabotechs commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2851878804 ## datafusion/physical-optimizer/src/enforce_distribution.rs: ## @@ -1454,21 +1481,87 @@ pub fn ensure_distribution( plan.with_new_children(children_pla

[PR] Support parenthesized `CREATE TABLE ... (LIKE ... INCLUDING/EXCLUDING DEFAULTS)` in `PostgreSQL` [datafusion-sqlparser-rs]

2026-02-25 Thread via GitHub
LucaCappelletti94 opened a new pull request, #2242: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2242 This PR fixes PostgreSQL parsing for: - `CREATE TABLE new (LIKE old INCLUDING DEFAULTS)` - `CREATE TABLE new (LIKE old EXCLUDING DEFAULTS)` These are valid Pos

Re: [I] Release DataFusion 52.2.0 (minor/) Release (Feb 2026) [datafusion]

2026-02-25 Thread via GitHub
hareshkh commented on issue #20287: URL: https://github.com/apache/datafusion/issues/20287#issuecomment-3958590102 @alamb Here is the backport for branch-52 : https://github.com/apache/datafusion/pull/20539 -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-25 Thread via GitHub
gene-bordegaray commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2852487046 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -86,6 +108,11 @@ struct Inner { /// This is redundant with the watch channel sta

Re: [PR] chore(deps): bump taiki-e/install-action from 2.67.13 to 2.68.2 [datafusion-sandbox]

2026-02-25 Thread via GitHub
dependabot[bot] commented on PR #181: URL: https://github.com/apache/datafusion-sandbox/pull/181#issuecomment-3958660107 Superseded by #184. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] chore(deps): bump taiki-e/install-action from 2.67.13 to 2.68.9 [datafusion-sandbox]

2026-02-25 Thread via GitHub
dependabot[bot] commented on PR #184: URL: https://github.com/apache/datafusion-sandbox/pull/184#issuecomment-3958659993 ### Labels The following labels could not be found: `auto-dependencies`. Please create it before Dependabot can add it to a pull request. Please fix the

[PR] chore(deps): bump taiki-e/install-action from 2.67.13 to 2.68.9 [datafusion-sandbox]

2026-02-25 Thread via GitHub
dependabot[bot] opened a new pull request, #184: URL: https://github.com/apache/datafusion-sandbox/pull/184 Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.67.13 to 2.68.9. Release notes Sourced from https://github.com/taiki-e/install-action/release

Re: [PR] chore(deps): bump taiki-e/install-action from 2.67.13 to 2.68.2 [datafusion-sandbox]

2026-02-25 Thread via GitHub
dependabot[bot] closed pull request #181: chore(deps): bump taiki-e/install-action from 2.67.13 to 2.68.2 URL: https://github.com/apache/datafusion-sandbox/pull/181 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] feat: Implement Spark `bin` function [datafusion]

2026-02-25 Thread via GitHub
martin-g merged PR #20479: URL: https://github.com/apache/datafusion/pull/20479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: Implement Spark `bin` function [datafusion]

2026-02-25 Thread via GitHub
martin-g commented on PR #20479: URL: https://github.com/apache/datafusion/pull/20479#issuecomment-3958675275 Thank you for the contribution, @kazantsev-maksim ! Thank you for the reviews @Jefffrey & @jonathanc-n ! -- This is an automated message from the Apache Git Service. To resp

[PR] fix: skips projection pruning for whole subtree [datafusion]

2026-02-25 Thread via GitHub
Acfboy opened a new pull request, #20545: URL: https://github.com/apache/datafusion/pull/20545 ## Which issue does this PR close? - Closes #18816 . ## Rationale for this change In `UserDefinedLogicalNodeCore`, the default implementation of `necessary_childre

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3958800042 Clearly something is wrong with the benchmarking script too -- I will figure that out -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] fix: handle empty delimiter in split_part (closes #20503) [datafusion]

2026-02-25 Thread via GitHub
gferrate commented on code in PR #20542: URL: https://github.com/apache/datafusion/pull/20542#discussion_r2852676223 ## datafusion/functions/src/string/split_part.rs: ## @@ -341,6 +356,117 @@ mod tests { Utf8, StringArray ); +// Edge ca

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb commented on code in PR #20481: URL: https://github.com/apache/datafusion/pull/20481#discussion_r2852747458 ## datafusion/datasource/src/file_stream.rs: ## @@ -113,15 +151,123 @@ impl FileStream { FileStreamState::Idle => { self.file_s

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3958961130 My next steps are: 1. Figure out why the benchmark runner is broken 2. Maybe try and sketch out the pipeline idea in more detail -- This is an automated message from the Apache

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3958979921 > I was reading the paper more in depth in recent days and concluded that the essence boils down to those things: I agree > thread per core (in our case already true when

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3958989182 run benchmark clickbench_partitioned -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3958990406 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-25 Thread via GitHub
gene-bordegaray commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2852867045 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -86,6 +108,11 @@ struct Inner { /// This is redundant with the watch channel sta

Re: [PR] Gene.bordegaray/2026/02/partition index dynamic filters [datafusion]

2026-02-25 Thread via GitHub
gene-bordegaray commented on code in PR #20331: URL: https://github.com/apache/datafusion/pull/20331#discussion_r2852867045 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -86,6 +108,11 @@ struct Inner { /// This is redundant with the watch channel sta

[I] Make it easier for agents to generate datafusion-python code [datafusion-python]

2026-02-25 Thread via GitHub
timsaucer opened a new issue, #1394: URL: https://github.com/apache/datafusion-python/issues/1394 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** More and more frequently users are reaching for LLMs to generate code and solve

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3959152090 🤖: Benchmark completed Details ``` Comparing HEAD and parquet-morsel-driven-execution-237164415184908839 Benchmark clickbench_pa

Re: [I] [EPIC] Update Iceberg integration to use `native_iceberg_compat` scan [datafusion-comet]

2026-02-25 Thread via GitHub
Shekharrajak commented on issue #2189: URL: https://github.com/apache/datafusion-comet/issues/2189#issuecomment-3959208287 Looking into it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
AdamGS commented on code in PR #20481: URL: https://github.com/apache/datafusion/pull/20481#discussion_r2853041247 ## datafusion/datasource/src/file_stream.rs: ## @@ -130,9 +130,16 @@ impl FileStream { /// /// Since file opening is mostly IO (and may involve a ///

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3959319228 The benchmark runner problems seems to have been that `uv` wasn't installed (which is now required after) - https://github.com/apache/datafusion/pull/20414 I have fixed that a

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3959320594 run benchmark clickbench_partitioned -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3959321792 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3959328155 run benchmark clickbench_partitioned DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true -- This is an automated message from th

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb closed pull request #20481: Introduce morsel-driven Parquet scan URL: https://github.com/apache/datafusion/pull/20481 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3959333294 The benchmark runner problems seems to have been that uv wasn't installed (which is now required after) https://github.com/apache/datafusion/pull/20414 I have fixed that and w

Re: [PR] Upgrade DataFusion to arrow-rs/parquet 58.0.0 / `object_store` 0.13.0 [datafusion]

2026-02-25 Thread via GitHub
alamb commented on PR #19728: URL: https://github.com/apache/datafusion/pull/19728#issuecomment-3959357812 run benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Extract out date time parsing to a DateTimeParser trait [datafusion]

2026-02-25 Thread via GitHub
martin-g commented on code in PR #19755: URL: https://github.com/apache/datafusion/pull/19755#discussion_r2852897599 ## datafusion/functions/src/datetime/mod.rs: ## @@ -267,7 +269,8 @@ pub mod expr_fn { /// # } /// ``` pub fn to_date(args: Vec) -> Expr { -

Re: [PR] Upgrade DataFusion to arrow-rs/parquet 58.0.0 / `object_store` 0.13.0 [datafusion]

2026-02-25 Thread via GitHub
alamb commented on PR #19728: URL: https://github.com/apache/datafusion/pull/19728#issuecomment-3959360119 run benchmark clickbench_partitioned -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Upgrade DataFusion to arrow-rs/parquet 58.0.0 / `object_store` 0.13.0 [datafusion]

2026-02-25 Thread via GitHub
alamb commented on PR #19728: URL: https://github.com/apache/datafusion/pull/19728#issuecomment-3959359739 run benchmark clickbench_partitioned DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true -- This is an automated message from th

[PR] [branch-52] Update aws-smithy, bytes and time for security audits [datafusion]

2026-02-25 Thread via GitHub
alamb opened a new pull request, #20546: URL: https://github.com/apache/datafusion/pull/20546 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/20287 ## Rationale for this change The security audit CI check [failed here](https://git

Re: [PR] Fix name tracker (#19856) [datafusion]

2026-02-25 Thread via GitHub
alamb commented on PR #20539: URL: https://github.com/apache/datafusion/pull/20539#issuecomment-3959414299 Security audit is not related to this PR I made a separate PR to fix that - https://github.com/apache/datafusion/pull/20546 -- This is an automated message from the Apache G

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3959417694 🤖: Benchmark completed Details ``` Comparing HEAD and parquet-morsel-driven-execution-237164415184908839 Benchmark clickbench_pa

Re: [PR] Introduce morsel-driven Parquet scan [datafusion]

2026-02-25 Thread via GitHub
alamb-ghbot commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3959418152 🤖 `./gh_compare_branch.sh` [gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gc

Re: [PR] Clamp early aggregation emit to the sort boundary when using partial group ordering [datafusion]

2026-02-25 Thread via GitHub
alamb commented on PR #20446: URL: https://github.com/apache/datafusion/pull/20446#issuecomment-3959428747 @jackkleeman are you ok with merging this PR as is (and I will backport it and propose the cleanup as a follow on PR)? -- This is an automated message from the Apache Git Service. To

Re: [I] Release DataFusion 52.2.0 (minor/) Release (Feb 2026) [datafusion]

2026-02-25 Thread via GitHub
alamb commented on issue #20287: URL: https://github.com/apache/datafusion/issues/20287#issuecomment-3959434020 I am looking for some committers to approve backports - [ ] https://github.com/apache/datafusion/pull/20507 - [ ] https://github.com/apache/datafusion/pull/20512 - [ ] http

Re: [PR] Fix name tracker (#19856) [datafusion]

2026-02-25 Thread via GitHub
alamb merged PR #20539: URL: https://github.com/apache/datafusion/pull/20539 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

  1   2   >