Re: [PR] feat: implement protobuf converter trait to allow control over serialization and deserialization processes [datafusion]

2026-01-19 Thread via GitHub
timsaucer commented on PR #19437: URL: https://github.com/apache/datafusion/pull/19437#issuecomment-3770144434 > It unfortunately has conflicts which blocks CI from running. I'd be happy to resolve them but don't want to force push to your branch. Could you clear them up so CI can run?

Re: [PR] Support API for "pre-image" for pruning predicate evaluation [datafusion]

2026-01-19 Thread via GitHub
alamb commented on code in PR #19722: URL: https://github.com/apache/datafusion/pull/19722#discussion_r2706112406 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -1969,12 +1972,101 @@ impl TreeNodeRewriter for Simplifier<'_> { }))

Re: [PR] chore(deps): bump taiki-e/install-action from 2.66.2 to 2.66.5 [datafusion-sandbox]

2026-01-19 Thread via GitHub
dependabot[bot] closed pull request #123: chore(deps): bump taiki-e/install-action from 2.66.2 to 2.66.5 URL: https://github.com/apache/datafusion-sandbox/pull/123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] chore(deps): bump taiki-e/install-action from 2.66.2 to 2.66.7 [datafusion-sandbox]

2026-01-19 Thread via GitHub
dependabot[bot] opened a new pull request, #125: URL: https://github.com/apache/datafusion-sandbox/pull/125 Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.66.2 to 2.66.7. Release notes Sourced from https://github.com/taiki-e/install-action/releases

Re: [PR] chore(deps): bump taiki-e/install-action from 2.66.2 to 2.66.5 [datafusion-sandbox]

2026-01-19 Thread via GitHub
dependabot[bot] commented on PR #123: URL: https://github.com/apache/datafusion-sandbox/pull/123#issuecomment-3768278201 Superseded by #125. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] chore(deps): bump taiki-e/install-action from 2.66.2 to 2.66.7 [datafusion-sandbox]

2026-01-19 Thread via GitHub
dependabot[bot] commented on PR #125: URL: https://github.com/apache/datafusion-sandbox/pull/125#issuecomment-3768278117 ### Labels The following labels could not be found: `auto-dependencies`. Please create it before Dependabot can add it to a pull request. Please fix the

Re: [I] Andrew Lamb Weekly-ish Open Source plan - 2026-01-05 [datafusion]

2026-01-19 Thread via GitHub
alamb closed issue #19652: Andrew Lamb Weekly-ish Open Source plan - 2026-01-05 URL: https://github.com/apache/datafusion/issues/19652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[I] Andrew Lamb Weekly-ish Open Source plan - 2026-01-19 [datafusion]

2026-01-19 Thread via GitHub
alamb opened a new issue, #19886: URL: https://github.com/apache/datafusion/issues/19886 This is my weekly plan, mostly for my own organizational need. I am making it public in the hopes that helps others to see what I am working on -- also I spend so much time in github the interface is v

Re: [I] Andrew Lamb Weekly-ish Open Source plan - 2026-01-05 [datafusion]

2026-01-19 Thread via GitHub
alamb commented on issue #19652: URL: https://github.com/apache/datafusion/issues/19652#issuecomment-3768500072 Next chunk: - https://github.com/apache/datafusion/issues/19886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Pushing down HashJoinExec build side dynamic filters makes tpch queries slower [datafusion]

2026-01-19 Thread via GitHub
adriangb commented on issue #19858: URL: https://github.com/apache/datafusion/issues/19858#issuecomment-3768551009 > My bet is that we’re seeing the **cost of evaluation on the probe side (2)**, specifically in cases where the dynamic filter has low selectivity. I think that's a reaso

[PR] Feat/extensions protobuf issue 1370 [datafusion-ballista]

2026-01-19 Thread via GitHub
LouisBurke opened a new pull request, #1393: URL: https://github.com/apache/datafusion-ballista/pull/1393 # Which issue does this PR close? Closes #. # Rationale for this change Doc clean up and minor addition to extensions example docs. # What changes are included in thi

Re: [PR] Feat/extensions protobuf issue 1370 [datafusion-ballista]

2026-01-19 Thread via GitHub
LouisBurke commented on PR #1393: URL: https://github.com/apache/datafusion-ballista/pull/1393#issuecomment-3768593543 @milenkovicm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Fix struct casts to align fields by name (prevent positional mis-casts) [datafusion]

2026-01-19 Thread via GitHub
adriangb commented on PR #19674: URL: https://github.com/apache/datafusion/pull/19674#issuecomment-3768609588 > > 1. Require at least one matching field - Eliminate positional fallback entirely > > I agree with this. The OP still says: > When two structs have the same se

Re: [PR] Fix DELETE/U{DATE filter extraction when predicates are pushed down into TableScan [datafusion]

2026-01-19 Thread via GitHub
mjgarton commented on code in PR #19884: URL: https://github.com/apache/datafusion/pull/19884#discussion_r2705090411 ## datafusion/core/src/physical_planner.rs: ## @@ -1912,24 +1907,48 @@ fn get_physical_expr_pair( } /// Extract filter predicates from a DML input plan (DELET

Re: [PR] Fix DELETE/U{DATE filter extraction when predicates are pushed down into TableScan [datafusion]

2026-01-19 Thread via GitHub
mjgarton commented on code in PR #19884: URL: https://github.com/apache/datafusion/pull/19884#discussion_r2705099717 ## datafusion/core/src/physical_planner.rs: ## @@ -1912,24 +1907,48 @@ fn get_physical_expr_pair( } /// Extract filter predicates from a DML input plan (DELET

Re: [PR] feat: add complex type support to native Parquet writer [datafusion-comet]

2026-01-19 Thread via GitHub
wForget commented on code in PR #3214: URL: https://github.com/apache/datafusion-comet/pull/3214#discussion_r2706444516 ## native/core/src/execution/operators/parquet_writer.rs: ## @@ -535,8 +535,12 @@ impl ExecutionPlan for ParquetWriterExec { DataFusionError::

Re: [PR] Support API for "pre-image" for pruning predicate evaluation [datafusion]

2026-01-19 Thread via GitHub
alamb commented on PR #19722: URL: https://github.com/apache/datafusion/pull/19722#issuecomment-3770693965 @sdf-jkl here are some tests and other small suggestions - https://github.com/sdf-jkl/datafusion/pull/1 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Fix incorrect regex pattern in regex_replace_posix_groups [datafusion]

2026-01-19 Thread via GitHub
GaneshPatil7517 commented on code in PR #19827: URL: https://github.com/apache/datafusion/pull/19827#discussion_r2703542503 ## datafusion/functions/src/regex/regexpreplace.rs: ## @@ -189,13 +189,15 @@ fn regexp_replace_func(args: &[ColumnarValue]) -> Result { } } -/// r

[PR] perf: Optimize scalar performance for cot [datafusion]

2026-01-19 Thread via GitHub
kumarUjjawal opened a new pull request, #19888: URL: https://github.com/apache/datafusion/pull/19888 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion-comet/issues/2986. ## Rationale for this change The cot function currently conve

[PR] fix: preserve state in DistinctMedianAccumulator::evaluate() for window frame queries [datafusion]

2026-01-19 Thread via GitHub
kumarUjjawal opened a new pull request, #19887: URL: https://github.com/apache/datafusion/pull/19887 ## Which issue does this PR close? - Closes #19612. ## Rationale for this change The `DistinctMedianAccumulator::evaluate()` method was using `std::mem::take()` w

Re: [PR] Fix DELETE/U{DATE filter extraction when predicates are pushed down into TableScan [datafusion]

2026-01-19 Thread via GitHub
ethan-tyler commented on code in PR #19884: URL: https://github.com/apache/datafusion/pull/19884#discussion_r2706191701 ## datafusion/core/tests/custom_sources_cases/dml_planning.rs: ## @@ -246,6 +269,75 @@ async fn test_delete_complex_expr() -> Result<()> { Ok(()) } +#[

Re: [PR] Allow struct field access projections to be pushed down into scans [datafusion]

2026-01-19 Thread via GitHub
adriangb commented on code in PR #19538: URL: https://github.com/apache/datafusion/pull/19538#discussion_r2700282271 ## datafusion/sqllogictest/test_files/limit.slt: ## @@ -846,10 +846,10 @@ logical_plan 05)Sort: test_limit_with_partitions.part_key ASC NULLS LAST, fetch

Re: [I] [Feature] Support Spark expression: days [datafusion-comet]

2026-01-19 Thread via GitHub
kazantsev-maksim commented on issue #3124: URL: https://github.com/apache/datafusion-comet/issues/3124#issuecomment-3769551895 I would like to work on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Fix DELETE/U{DATE filter extraction when predicates are pushed down into TableScan [datafusion]

2026-01-19 Thread via GitHub
adriangb commented on PR #19884: URL: https://github.com/apache/datafusion/pull/19884#issuecomment-3769536637 Would it help to have a `filters` field on `TableScan` similar to projection? That's something I've wanted to various reasons. My view is that if we define what a scan is as a

Re: [PR] feat: support pushdown alias on dynamic filter with `ProjectionExec` [datafusion]

2026-01-19 Thread via GitHub
adriangb commented on code in PR #19404: URL: https://github.com/apache/datafusion/pull/19404#discussion_r2705691145 ## datafusion/physical-plan/src/projection.rs: ## @@ -347,10 +371,30 @@ impl ExecutionPlan for ProjectionExec { parent_filters: Vec>, _config: &

Re: [PR] Support API for "pre-image" for pruning predicate evaluation [datafusion]

2026-01-19 Thread via GitHub
sdf-jkl commented on PR #19722: URL: https://github.com/apache/datafusion/pull/19722#issuecomment-3769929618 Should be good -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Support API for "pre-image" for pruning predicate evaluation [datafusion]

2026-01-19 Thread via GitHub
sdf-jkl commented on PR #19722: URL: https://github.com/apache/datafusion/pull/19722#issuecomment-3770366225 I definitely appreciate the feedback, and back and forth. Thanks, I'll work on addressing it. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Support API for "pre-image" for pruning predicate evaluation [datafusion]

2026-01-19 Thread via GitHub
sdf-jkl commented on code in PR #19722: URL: https://github.com/apache/datafusion/pull/19722#discussion_r2706470597 ## datafusion/optimizer/src/simplify_expressions/udf_preimage.rs: ## @@ -0,0 +1,270 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [I] Release DataFusion 52.1.0 or 52.0.1 (minor/patch) Release (Jan 2026) [datafusion]

2026-01-19 Thread via GitHub
alamb commented on issue #19784: URL: https://github.com/apache/datafusion/issues/19784#issuecomment-3769334146 Thanks to @milenkovicm for the approval. I have made a 52.1.0 release candidate and started voting https://lists.apache.org/thread/fmdnt05qnj2hqw87w49jf658q8qtxzc1 -- Thi

Re: [I] Planning time for queries with many columns with union and order by is very slow [datafusion]

2026-01-19 Thread via GitHub
Omega359 commented on issue #17261: URL: https://github.com/apache/datafusion/issues/17261#issuecomment-3769342137 I ran samply last night against a slightly modified sql_planner_extended benchmark which had some interesting results. ```Rust fn criterion_benchmark(c: &mut Criterio

Re: [PR] [branch-52] Update version to `52.1.0` [datafusion]

2026-01-19 Thread via GitHub
alamb merged PR #19878: URL: https://github.com/apache/datafusion/pull/19878 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore: update datafusion to 52.0 [datafusion-ballista]

2026-01-19 Thread via GitHub
milenkovicm commented on code in PR #1394: URL: https://github.com/apache/datafusion-ballista/pull/1394#discussion_r2706144736 ## ballista/core/src/execution_plans/shuffle_writer.rs: ## @@ -263,6 +263,10 @@ impl ShuffleWriterExec { let mut partitioner = Bat

Re: [I] Release datafusion ballista v.51 [datafusion-ballista]

2026-01-19 Thread via GitHub
milenkovicm closed issue #1371: Release datafusion ballista v.51 URL: https://github.com/apache/datafusion-ballista/issues/1371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Ensure idempotency in the DataFusion physical optimizer to support Ballista AQE [datafusion-ballista]

2026-01-19 Thread via GitHub
milenkovicm closed issue #1378: Ensure idempotency in the DataFusion physical optimizer to support Ballista AQE URL: https://github.com/apache/datafusion-ballista/issues/1378 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Ensure idempotency in the DataFusion physical optimizer to support Ballista AQE [datafusion-ballista]

2026-01-19 Thread via GitHub
milenkovicm commented on issue #1378: URL: https://github.com/apache/datafusion-ballista/issues/1378#issuecomment-3770271465 i guess we can close this task, if new issues are found we could track it in #1359 thanks @danielhumanmod -- This is an automated message from the Apache Git

Re: [PR] perf: Optimize `multi_group_by` when there are a lot of unique groups [datafusion]

2026-01-19 Thread via GitHub
github-actions[bot] commented on PR #17592: URL: https://github.com/apache/datafusion/pull/17592#issuecomment-3770712815 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Code changing test [datafusion-sandbox]

2026-01-19 Thread via GitHub
github-actions[bot] closed pull request #59: Code changing test URL: https://github.com/apache/datafusion-sandbox/pull/59 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Non code changing test [datafusion-sandbox]

2026-01-19 Thread via GitHub
github-actions[bot] closed pull request #58: Non code changing test URL: https://github.com/apache/datafusion-sandbox/pull/58 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: [EXPERIMENTAL] Native columnar to row conversion [datafusion-comet]

2026-01-19 Thread via GitHub
codecov-commenter commented on PR #3221: URL: https://github.com/apache/datafusion-comet/pull/3221#issuecomment-3770758786 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3221?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] docs: add Docker-based workflow for building documentation [datafusion]

2026-01-19 Thread via GitHub
GaneshPatil7517 commented on code in PR #19863: URL: https://github.com/apache/datafusion/pull/19863#discussion_r2706760703 ## docs/Dockerfile: ## @@ -0,0 +1,35 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the N

Re: [PR] Fix DELETE/U{DATE filter extraction when predicates are pushed down into TableScan [datafusion]

2026-01-19 Thread via GitHub
kosiew commented on code in PR #19884: URL: https://github.com/apache/datafusion/pull/19884#discussion_r2707178001 ## datafusion/core/src/physical_planner.rs: ## @@ -1907,24 +1907,48 @@ fn get_physical_expr_pair( } /// Extract filter predicates from a DML input plan (DELETE/

[PR] feat: change Expr OuterReferenceColumn and Alias to Box type for reducing expr struct size [datafusion]

2026-01-19 Thread via GitHub
zhuqi-lucas opened a new pull request, #16771: URL: https://github.com/apache/datafusion/pull/16771 ## Which issue does this PR close? Continue to reduce the Expr struct size. - Closes [#16770](https://github.com/apache/datafusion/issues/16770) ## Rationale for this chang

Re: [PR] Fix DELETE/U{DATE filter extraction when predicates are pushed down into TableScan [datafusion]

2026-01-19 Thread via GitHub
kosiew commented on code in PR #19884: URL: https://github.com/apache/datafusion/pull/19884#discussion_r2707170578 ## datafusion/core/src/physical_planner.rs: ## @@ -1907,24 +1907,48 @@ fn get_physical_expr_pair( } /// Extract filter predicates from a DML input plan (DELETE/

[I] [datafusion-spark] add `unix_date/micros/millis/seconds` functions [datafusion]

2026-01-19 Thread via GitHub
cht42 opened a new issue, #19891: URL: https://github.com/apache/datafusion/issues/19891 ### Is your feature request related to a problem or challenge? Add the following spark functions - https://spark.apache.org/docs/latest/api/sql/index.html#unix_date - https://spark.apache.org

Re: [PR] Fix struct casts to align fields by name (prevent positional mis-casts) [datafusion]

2026-01-19 Thread via GitHub
kosiew commented on PR #19674: URL: https://github.com/apache/datafusion/pull/19674#issuecomment-3770961364 @adriangb Thanks for the clarification. Amended it from: `When two structs have the same set of field names (possibly in different order), coerce by name.` to `When

Re: [PR] Support "pre-image" for pruning predicate evaluation #1 [datafusion]

2026-01-19 Thread via GitHub
alamb commented on PR #19722: URL: https://github.com/apache/datafusion/pull/19722#issuecomment-3769665488 I merged up from to fix clippy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Support API for "pre-image" for pruning predicate evaluation [datafusion]

2026-01-19 Thread via GitHub
sdf-jkl commented on PR #19722: URL: https://github.com/apache/datafusion/pull/19722#issuecomment-3769709701 I'll take a look at the failing tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [EXPERIMENTAL] Add cost-based optimizer (CBO) for Comet vs Spark execution [datafusion-comet]

2026-01-19 Thread via GitHub
andygrove opened a new pull request, #3220: URL: https://github.com/apache/datafusion-comet/pull/3220 ## Summary This PR introduces an **experimental** lightweight cost-based optimizer (CBO) that estimates whether a Comet query plan will be faster than a Spark plan, falling back to S

Re: [PR] feat: add complex type support to native Parquet writer [datafusion-comet]

2026-01-19 Thread via GitHub
andygrove commented on PR #3214: URL: https://github.com/apache/datafusion-comet/pull/3214#issuecomment-3769620049 Thanks for the thorough review @wForget. I refactored the test framework and added assertions that the plans are running as intended (Spark vs Comet). -- This is an automate

Re: [PR] Allow struct field access projections to be pushed down into scans [datafusion]

2026-01-19 Thread via GitHub
adriangb commented on code in PR #19538: URL: https://github.com/apache/datafusion/pull/19538#discussion_r2705741301 ## datafusion/expr/src/udf.rs: ## @@ -846,6 +851,18 @@ pub trait ScalarUDFImpl: Debug + DynEq + DynHash + Send + Sync { fn documentation(&self) -> Option<&D

[PR] chore: update datafusion to 52.0 [datafusion-ballista]

2026-01-19 Thread via GitHub
killzoner opened a new pull request, #1394: URL: https://github.com/apache/datafusion-ballista/pull/1394 # Which issue does this PR close? Closes https://github.com/apache/datafusion-ballista/issues/1357#issuecomment-3767799814. # Rationale for this change D

Re: [PR] Pass Field information back and forth when using scalar UDFs [datafusion-python]

2026-01-19 Thread via GitHub
Copilot commented on code in PR #1299: URL: https://github.com/apache/datafusion-python/pull/1299#discussion_r2705860116 ## python/datafusion/user_defined.py: ## @@ -212,23 +237,25 @@ def _function( name = func.__qualname__.lower() else:

Re: [PR] chore: update datafusion to 52.0 [datafusion-ballista]

2026-01-19 Thread via GitHub
killzoner commented on code in PR #1394: URL: https://github.com/apache/datafusion-ballista/pull/1394#discussion_r2705872478 ## ballista/core/src/execution_plans/shuffle_writer.rs: ## @@ -263,6 +263,10 @@ impl ShuffleWriterExec { let mut partitioner = Batch

Re: [I] `ParquetOpener` fails on files without `PageIndex` metadata [datafusion]

2026-01-19 Thread via GitHub
friendlymatthew commented on issue #19839: URL: https://github.com/apache/datafusion/issues/19839#issuecomment-3769807336 > You’re absolutely right—setting the policy to `Optional` is the right first step to prevent the initial "hard crash" on files without metadata, but it doesn't solve th

Re: [I] Support `ListView`, `LargeListView` in `ScalarValue` [datafusion]

2026-01-19 Thread via GitHub
dqkqd commented on issue #18886: URL: https://github.com/apache/datafusion/issues/18886#issuecomment-3770462745 > @dqkqd Are you still working on this issue? Sorry, I am not, please remove me. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] feat: implement protobuf converter trait to allow control over serialization and deserialization processes [datafusion]

2026-01-19 Thread via GitHub
timsaucer commented on PR #19437: URL: https://github.com/apache/datafusion/pull/19437#issuecomment-3770472953 It turns out datafusion-python is using `PhysicalPlanNode::try_from_physical_plan()` and others so we do need an entry in the upgrade guide. -- This is an automated message from

Re: [PR] Fix struct casts to align fields by name (prevent positional mis-casts) [datafusion]

2026-01-19 Thread via GitHub
adriangb commented on PR #19674: URL: https://github.com/apache/datafusion/pull/19674#issuecomment-3770940810 > @adriangb > > ``` > The OP still says: > > > When two structs have the same set of field names (possibly in different order), coerce by name. > > Otherwise, pre

Re: [PR] Fix struct casts to align fields by name (prevent positional mis-casts) [datafusion]

2026-01-19 Thread via GitHub
kosiew commented on PR #19674: URL: https://github.com/apache/datafusion/pull/19674#issuecomment-3770863969 @adriangb ``` The OP still says: > When two structs have the same set of field names (possibly in different order), coerce by name. > Otherwise, preserve prior beh

Re: [PR] Fix struct casts to align fields by name (prevent positional mis-casts) [datafusion]

2026-01-19 Thread via GitHub
adriangb commented on code in PR #19674: URL: https://github.com/apache/datafusion/pull/19674#discussion_r2706697110 ## datafusion/common/src/nested_struct.rs: ## @@ -31,6 +31,7 @@ use std::sync::Arc; /// /// ## Field Matching Strategy /// - **By Name**: Source struct fields

Re: [PR] feat: Creating SubstraitSchedulerClient and standalone Substrait examples [datafusion-ballista]

2026-01-19 Thread via GitHub
mattcuento commented on PR #1376: URL: https://github.com/apache/datafusion-ballista/pull/1376#issuecomment-3771106271 Hey @milenkovicm 👋 thanks for the feedback, you're probably right. > Could you just use in memory catalog on the scheduler (initialised with custom scheduler) to ke

[PR] Fix Python UDAF list-of-timestamps return by enforcing list-valued scalars and caching PyArrow types [datafusion-python]

2026-01-19 Thread via GitHub
kosiew opened a new pull request, #1347: URL: https://github.com/apache/datafusion-python/pull/1347 ## Which issue does this PR close? * Closes #1339. ## Rationale for this change Python UDAFs that logically return a *list* (e.g., “collect all timestamps for a group”) we

Re: [I] Log pollution from `Record batch memory usage exceeds the expected limit` [datafusion]

2026-01-19 Thread via GitHub
xudong963 commented on issue #19846: URL: https://github.com/apache/datafusion/issues/19846#issuecomment-3771314322 @2010YOUY01 I could add some details about when the log occurs in our production! -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] chore: reduce production noise by using `debug` macro [datafusion]

2026-01-19 Thread via GitHub
xudong963 commented on PR #19885: URL: https://github.com/apache/datafusion/pull/19885#issuecomment-3771318119 > adjusting `SPILL_BATCH_MEMORY_MARGIN` Looks like we can make the config flexible first, such as for string view, the memory margin could be larger, right? @2010YOUY01 --

Re: [PR] Fix DELETE/U{DATE filter extraction when predicates are pushed down into TableScan [datafusion]

2026-01-19 Thread via GitHub
kosiew commented on code in PR #19884: URL: https://github.com/apache/datafusion/pull/19884#discussion_r2707034929 ## datafusion/core/tests/custom_sources_cases/dml_planning.rs: ## @@ -246,6 +269,75 @@ async fn test_delete_complex_expr() -> Result<()> { Ok(()) } +#[tokio

Re: [PR] Fix DELETE/U{DATE filter extraction when predicates are pushed down into TableScan [datafusion]

2026-01-19 Thread via GitHub
kosiew commented on code in PR #19884: URL: https://github.com/apache/datafusion/pull/19884#discussion_r2707055430 ## datafusion/core/src/physical_planner.rs: ## @@ -1912,24 +1907,48 @@ fn get_physical_expr_pair( } /// Extract filter predicates from a DML input plan (DELETE/

Re: [PR] Fix DELETE/U{DATE filter extraction when predicates are pushed down into TableScan [datafusion]

2026-01-19 Thread via GitHub
kosiew commented on code in PR #19884: URL: https://github.com/apache/datafusion/pull/19884#discussion_r2707068707 ## datafusion/core/src/physical_planner.rs: ## @@ -1907,24 +1907,48 @@ fn get_physical_expr_pair( } /// Extract filter predicates from a DML input plan (DELETE/

Re: [PR] Fix DELETE/U{DATE filter extraction when predicates are pushed down into TableScan [datafusion]

2026-01-19 Thread via GitHub
kosiew commented on code in PR #19884: URL: https://github.com/apache/datafusion/pull/19884#discussion_r2707034929 ## datafusion/core/tests/custom_sources_cases/dml_planning.rs: ## @@ -246,6 +269,75 @@ async fn test_delete_complex_expr() -> Result<()> { Ok(()) } +#[tokio

[PR] Fix/parquet opener page index policy [datafusion]

2026-01-19 Thread via GitHub
aviralgarg05 opened a new pull request, #19890: URL: https://github.com/apache/datafusion/pull/19890 ## Which issue does this PR close? - Closes #19839. ## Rationale for this change The [ParquetOpener](cci:2://file:///Users/aviralgarg/Everything/datafusion/datafusion/dat

Re: [PR] feat(spark): implement add_months function [datafusion]

2026-01-19 Thread via GitHub
Jefffrey commented on code in PR #19711: URL: https://github.com/apache/datafusion/pull/19711#discussion_r2703694158 ## datafusion/sqllogictest/test_files/spark/datetime/add_months.slt: ## @@ -15,13 +15,38 @@ # specific language governing permissions and limitations # under th

Re: [PR] perf: Optimize round scalar performance [datafusion]

2026-01-19 Thread via GitHub
Jefffrey merged PR #19831: URL: https://github.com/apache/datafusion/pull/19831 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat(spark): Add `SessionStateBuilderSpark` to datafusion-spark [datafusion]

2026-01-19 Thread via GitHub
cht42 commented on code in PR #19865: URL: https://github.com/apache/datafusion/pull/19865#discussion_r2703677478 ## datafusion/spark/src/lib.rs: ## @@ -93,10 +93,42 @@ //! ``` //! //![`Expr`]: datafusion_expr::Expr +//! +//! # Example: enabling Apache Spark features with Ses

Re: [PR] perf: Optimize round scalar performance [datafusion]

2026-01-19 Thread via GitHub
Jefffrey commented on PR #19831: URL: https://github.com/apache/datafusion/pull/19831#issuecomment-3766999740 Thanks @kumarUjjawal & @martin-g -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat(spark): implement add_months function [datafusion]

2026-01-19 Thread via GitHub
cht42 commented on code in PR #19711: URL: https://github.com/apache/datafusion/pull/19711#discussion_r2703686463 ## datafusion/sqllogictest/test_files/spark/datetime/add_months.slt: ## @@ -15,13 +15,38 @@ # specific language governing permissions and limitations # under the L

Re: [PR] feat(memory-tracking): implement arrow_buffer::MemoryPool for MemoryPool [datafusion]

2026-01-19 Thread via GitHub
kosiew commented on code in PR #18928: URL: https://github.com/apache/datafusion/pull/18928#discussion_r2703710526 ## datafusion/execution/src/memory_pool/arrow.rs: ## @@ -0,0 +1,121 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

[PR] chore(deps): bump thiserror from 2.0.17 to 2.0.18 in /native [datafusion-comet]

2026-01-19 Thread via GitHub
dependabot[bot] opened a new pull request, #3218: URL: https://github.com/apache/datafusion-comet/pull/3218 Bumps [thiserror](https://github.com/dtolnay/thiserror) from 2.0.17 to 2.0.18. Release notes Sourced from https://github.com/dtolnay/thiserror/releases";>thiserror's release

[PR] chore(deps): bump serde_json from 1.0.148 to 1.0.149 in /native [datafusion-comet]

2026-01-19 Thread via GitHub
dependabot[bot] opened a new pull request, #3219: URL: https://github.com/apache/datafusion-comet/pull/3219 Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.148 to 1.0.149. Release notes Sourced from https://github.com/serde-rs/json/releases";>serde_json's releases.

[PR] Fix DELETE/U{DATE filter extraction when predicates are pushed down into TableScan [datafusion]

2026-01-19 Thread via GitHub
kosiew opened a new pull request, #19884: URL: https://github.com/apache/datafusion/pull/19884 ## Which issue does this PR close? * Closes #19840. --- ## Rationale for this change When a `TableProvider` supports filter pushdown (for example `TableProviderFilte

Re: [I] `TableProvider::delete_from` problem with pushed down filters [datafusion]

2026-01-19 Thread via GitHub
kosiew commented on issue #19840: URL: https://github.com/apache/datafusion/issues/19840#issuecomment-3767435860 @mjgarton Your explanation does help 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Preserve input field nullability in ArrayAgg return field [datafusion]

2026-01-19 Thread via GitHub
Jefffrey commented on code in PR #19868: URL: https://github.com/apache/datafusion/pull/19868#discussion_r2703670040 ## datafusion/functions-nested/src/sort.rs: ## @@ -137,10 +137,18 @@ impl ScalarUDFImpl for ArraySort { match &arg_types[0] { DataType::Null

Re: [PR] Refactor ListArray hashing to consider only sliced values [datafusion]

2026-01-19 Thread via GitHub
kosiew commented on code in PR #19500: URL: https://github.com/apache/datafusion/pull/19500#discussion_r2703645171 ## datafusion/common/src/hash_utils.rs: ## @@ -513,24 +514,41 @@ fn hash_list_array( where OffsetSize: OffsetSizeTrait, { -let values = array.values(); -

Re: [PR] doc: Add Ballista extensions example to the docs. [datafusion-ballista]

2026-01-19 Thread via GitHub
LouisBurke commented on PR #1382: URL: https://github.com/apache/datafusion-ballista/pull/1382#issuecomment-3767321025 Hi @milenkovicm , I had some other small edits and comments but fair enough. One thing is how we would host code examples? My only idea was to add the example code to

Re: [I] `ParquetOpener` fails on files without `PageIndex` metadata [datafusion]

2026-01-19 Thread via GitHub
aviralgarg05 commented on issue #19839: URL: https://github.com/apache/datafusion/issues/19839#issuecomment-3767318548 You’re absolutely right—setting the policy to `Optional` is the right first step to prevent the initial "hard crash" on files without metadata, but it doesn't solve the dow

Re: [PR] [WIP] Ballista substrait examples [datafusion-ballista]

2026-01-19 Thread via GitHub
milenkovicm commented on PR #1376: URL: https://github.com/apache/datafusion-ballista/pull/1376#issuecomment-3767336888 i'll have a better look later, and i may be wrong, but i have a feeling you're overcomplicating a bit. this is just an example, it does not have to be perfect. Could

Re: [PR] doc: Add Ballista extensions example to the docs. [datafusion-ballista]

2026-01-19 Thread via GitHub
milenkovicm commented on PR #1382: URL: https://github.com/apache/datafusion-ballista/pull/1382#issuecomment-3767344403 if you have further changes please do open new PR . we can store them to examples folder, but i did not want to bring such a big codebase for now -- This is an auto

[PR] chore(deps): bump cc from 1.2.52 to 1.2.53 in /native [datafusion-comet]

2026-01-19 Thread via GitHub
dependabot[bot] opened a new pull request, #3217: URL: https://github.com/apache/datafusion-comet/pull/3217 Bumps [cc](https://github.com/rust-lang/cc-rs) from 1.2.52 to 1.2.53. Release notes Sourced from https://github.com/rust-lang/cc-rs/releases";>cc's releases. cc-v1.2.53

[PR] chore(deps): bump url from 2.5.7 to 2.5.8 in /native [datafusion-comet]

2026-01-19 Thread via GitHub
dependabot[bot] opened a new pull request, #3216: URL: https://github.com/apache/datafusion-comet/pull/3216 Bumps [url](https://github.com/servo/rust-url) from 2.5.7 to 2.5.8. Commits https://github.com/servo/rust-url/commit/d6ea13c5f8e7e6e627f6390161b3e185bda5e5ce";>d6ea13c Bu

Re: [PR] docs: add Docker-based workflow for building documentation [datafusion]

2026-01-19 Thread via GitHub
Jefffrey commented on code in PR #19863: URL: https://github.com/apache/datafusion/pull/19863#discussion_r2703819062 ## docs/README.md: ## @@ -25,11 +25,31 @@ https://datafusion.apache.org/ as part of the release process. ## Dependencies +### Option 1: Docker (Recommended)

Re: [PR] feat(spark): implement add_months function [datafusion]

2026-01-19 Thread via GitHub
cht42 commented on code in PR #19711: URL: https://github.com/apache/datafusion/pull/19711#discussion_r2703895609 ## datafusion/sqllogictest/test_files/spark/datetime/add_months.slt: ## @@ -15,13 +15,38 @@ # specific language governing permissions and limitations # under the L

Re: [I] Investigate use of `DynamicFilters` in ballista [datafusion-ballista]

2026-01-19 Thread via GitHub
milenkovicm commented on issue #1375: URL: https://github.com/apache/datafusion-ballista/issues/1375#issuecomment-3767270050 I'm not sure we should share them between executors, it would get just too complex. perhaps if we focus: - could we use them per partition ? - could

[PR] chore(deps): bump taiki-e/install-action from 2.66.5 to 2.66.7 [datafusion]

2026-01-19 Thread via GitHub
dependabot[bot] opened a new pull request, #19883: URL: https://github.com/apache/datafusion/pull/19883 Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.66.5 to 2.66.7. Release notes Sourced from https://github.com/taiki-e/install-action/releases";>t

Re: [PR] docs: add Docker-based workflow for building documentation [datafusion]

2026-01-19 Thread via GitHub
GaneshPatil7517 commented on code in PR #19863: URL: https://github.com/apache/datafusion/pull/19863#discussion_r2704105290 ## docs/README.md: ## @@ -25,11 +25,31 @@ https://datafusion.apache.org/ as part of the release process. ## Dependencies +### Option 1: Docker (Recom

Re: [PR] docs: add Docker-based workflow for building documentation [datafusion]

2026-01-19 Thread via GitHub
GaneshPatil7517 commented on code in PR #19863: URL: https://github.com/apache/datafusion/pull/19863#discussion_r2704105290 ## docs/README.md: ## @@ -25,11 +25,31 @@ https://datafusion.apache.org/ as part of the release process. ## Dependencies +### Option 1: Docker (Recom

Re: [PR] Fix DELETE/U{DATE filter extraction when predicates are pushed down into TableScan [datafusion]

2026-01-19 Thread via GitHub
mjgarton commented on code in PR #19884: URL: https://github.com/apache/datafusion/pull/19884#discussion_r2704212209 ## datafusion/core/src/physical_planner.rs: ## @@ -1912,24 +1907,48 @@ fn get_physical_expr_pair( } /// Extract filter predicates from a DML input plan (DELET

Re: [I] Update to DataFusion v.52 [datafusion-ballista]

2026-01-19 Thread via GitHub
killzoner commented on issue #1357: URL: https://github.com/apache/datafusion-ballista/issues/1357#issuecomment-3767772134 Hey sure, I can take this one Not sure to see the link with https://github.com/apache/datafusion-ballista/issues/1357 though, is there a patch to reproduce at balli

Re: [PR] Fix DELETE/U{DATE filter extraction when predicates are pushed down into TableScan [datafusion]

2026-01-19 Thread via GitHub
kosiew commented on code in PR #19884: URL: https://github.com/apache/datafusion/pull/19884#discussion_r2704403935 ## datafusion/core/src/physical_planner.rs: ## @@ -1912,24 +1907,48 @@ fn get_physical_expr_pair( } /// Extract filter predicates from a DML input plan (DELETE/

Re: [PR] feat: Add batch coalescing ability to shuffle reader exec [datafusion-ballista]

2026-01-19 Thread via GitHub
Dandandan commented on PR #1380: URL: https://github.com/apache/datafusion-ballista/pull/1380#issuecomment-3768100430 For benhmarking the performance change for small batches, probably it is better to run against Parquet (to make the scan less of a bottleneck) and with a high number of tas

Re: [I] Log pollution from `Record batch memory usage exceeds the expected limit` [datafusion]

2026-01-19 Thread via GitHub
2010YOUY01 commented on issue #19846: URL: https://github.com/apache/datafusion/issues/19846#issuecomment-3768160778 > I suggest we do both > > We definitely shouldn't be `warn`ing for a known issue. This is not a known issue; it is more like an assertion guarding against a pot

Re: [I] Log pollution from `Record batch memory usage exceeds the expected limit` [datafusion]

2026-01-19 Thread via GitHub
AnjaliChoudhary99 commented on issue #19846: URL: https://github.com/apache/datafusion/issues/19846#issuecomment-3767577616 Hi can you assign it to me? @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[PR] chore: reduce production noise by using `debug` macro [datafusion]

2026-01-19 Thread via GitHub
Standing-Man opened a new pull request, #19885: URL: https://github.com/apache/datafusion/pull/19885 ## Which issue does this PR close? - Closes #19846. ## Rationale for this change ## What changes are included in this PR? ## Are these chang

Re: [I] Log pollution from `Record batch memory usage exceeds the expected limit` [datafusion]

2026-01-19 Thread via GitHub
kumarUjjawal commented on issue #19846: URL: https://github.com/apache/datafusion/issues/19846#issuecomment-3767584544 Hi @AnjaliChoudhary99 You can assign it to yourself by typing take in the comments. -- This is an automated message from the Apache Git Service. To respond to the message

  1   2   >