Re: [PR] feat: Implement Spark function `space` [datafusion]

2026-01-02 Thread via GitHub
andygrove commented on code in PR #19610: URL: https://github.com/apache/datafusion/pull/19610#discussion_r2657943917 ## datafusion/sqllogictest/test_files/spark/string/space.slt: ## @@ -0,0 +1,41 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contri

Re: [PR] feat: Implement Spark function `space` [datafusion]

2026-01-02 Thread via GitHub
andygrove commented on code in PR #19610: URL: https://github.com/apache/datafusion/pull/19610#discussion_r2657942992 ## datafusion/spark/src/function/string/space.rs: ## @@ -0,0 +1,178 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] chore: Add microbenchmark for casting string to numeric [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove commented on code in PR #2979: URL: https://github.com/apache/datafusion-comet/pull/2979#discussion_r2658034465 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometCastStringToNumericBenchmark.scala: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software F

Re: [PR] chore: Add microbenchmark for casting string to numeric [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove merged PR #2979: URL: https://github.com/apache/datafusion-comet/pull/2979 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[I] Add ability for microbenchmarks to verify that queries actually ran in Comet [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove opened a new issue, #3019: URL: https://github.com/apache/datafusion-comet/issues/3019 ### What is the problem the feature request solves? When we run the microbenchmarks, we don't really know if the queries are running in Comet. It would be good to add a feature to allow th

Re: [PR] perf: Improve performance of normalize_nan [datafusion-comet]

2026-01-02 Thread via GitHub
comphead commented on code in PR #2999: URL: https://github.com/apache/datafusion-comet/pull/2999#discussion_r2658043716 ## native/spark-expr/src/math_funcs/internal/normalize_nan.rs: ## @@ -78,14 +79,16 @@ impl PhysicalExpr for NormalizeNaNAndZero { match &self.data_

Re: [PR] chore: Add microbenchmark for casting string to numeric [datafusion-comet]

2026-01-02 Thread via GitHub
comphead commented on code in PR #2979: URL: https://github.com/apache/datafusion-comet/pull/2979#discussion_r2658031162 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometCastStringToNumericBenchmark.scala: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] perf: Improve performance of normalize_nan [datafusion-comet]

2026-01-02 Thread via GitHub
sqlbenchmark commented on PR #2999: URL: https://github.com/apache/datafusion-comet/pull/2999#issuecomment-3705752192 ## Comet TPC-H Benchmark Results **Commit:** `d774b1a` - revert **Scale Factor:** SF100 **Iterations:** 1 ### Query Times | Query | Time (s) | Quer

Re: [PR] feat: support pushdown alias on dynamic filter with `ProjectionExec` [datafusion]

2026-01-02 Thread via GitHub
adriangb commented on code in PR #19404: URL: https://github.com/apache/datafusion/pull/19404#discussion_r2658026051 ## datafusion/physical-expr/src/utils/mod.rs: ## @@ -238,6 +239,21 @@ pub fn collect_columns(expr: &Arc) -> HashSet { columns } +pub fn have_unknown_colu

Re: [PR] perf: Improve performance of normalize_nan [datafusion-comet]

2026-01-02 Thread via GitHub
comphead commented on code in PR #2999: URL: https://github.com/apache/datafusion-comet/pull/2999#discussion_r2658043716 ## native/spark-expr/src/math_funcs/internal/normalize_nan.rs: ## @@ -78,14 +79,16 @@ impl PhysicalExpr for NormalizeNaNAndZero { match &self.data_

Re: [PR] Add heap_size to statistics [datafusion]

2026-01-02 Thread via GitHub
mkleen commented on code in PR #19599: URL: https://github.com/apache/datafusion/pull/19599#discussion_r2658049789 ## datafusion/common/src/stats.rs: ## @@ -321,6 +321,13 @@ impl Statistics { } } +/// Returns the memory size in bytes. +pub fn heap_size(&s

Re: [I] [BUG] Error when adding Date32 and Int64 [datafusion]

2026-01-02 Thread via GitHub
RSAgr commented on issue #12342: URL: https://github.com/apache/datafusion/issues/12342#issuecomment-3705769838 Got it šŸ‘ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add heap_size to statistics [datafusion]

2026-01-02 Thread via GitHub
mkleen commented on code in PR #19599: URL: https://github.com/apache/datafusion/pull/19599#discussion_r2658055879 ## datafusion/common/src/stats.rs: ## @@ -321,6 +321,13 @@ impl Statistics { } } +/// Returns the memory size in bytes. +pub fn heap_size(&s

Re: [PR] Add heap_size to statistics [datafusion]

2026-01-02 Thread via GitHub
adriangb commented on code in PR #19599: URL: https://github.com/apache/datafusion/pull/19599#discussion_r2658055048 ## datafusion/common/src/stats.rs: ## @@ -321,6 +321,13 @@ impl Statistics { } } +/// Returns the memory size in bytes. +pub fn heap_size(

Re: [PR] chore: Add TPCDS benchmark comparison for PR [datafusion]

2026-01-02 Thread via GitHub
comphead merged PR #19552: URL: https://github.com/apache/datafusion/pull/19552 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Refactor `percentile_cont` to clarify support input types [datafusion]

2026-01-02 Thread via GitHub
comphead commented on code in PR #19611: URL: https://github.com/apache/datafusion/pull/19611#discussion_r2658014079 ## datafusion/functions-aggregate/src/percentile_cont.rs: ## @@ -297,76 +212,71 @@ impl AggregateUDFImpl for PercentileCont { ]) } -fn accumul

Re: [PR] Refactor `percentile_cont` to clarify support input types [datafusion]

2026-01-02 Thread via GitHub
comphead commented on code in PR #19611: URL: https://github.com/apache/datafusion/pull/19611#discussion_r2658015196 ## datafusion/functions-aggregate/src/percentile_cont.rs: ## @@ -297,76 +212,71 @@ impl AggregateUDFImpl for PercentileCont { ]) } -fn accumul

Re: [PR] feat: Implement Spark function `space` [datafusion]

2026-01-02 Thread via GitHub
andygrove commented on code in PR #19610: URL: https://github.com/apache/datafusion/pull/19610#discussion_r2657906973 ## datafusion/spark/src/function/string/space.rs: ## @@ -0,0 +1,178 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: Implement Spark function `space` [datafusion]

2026-01-02 Thread via GitHub
andygrove commented on code in PR #19610: URL: https://github.com/apache/datafusion/pull/19610#discussion_r2657906973 ## datafusion/spark/src/function/string/space.rs: ## @@ -0,0 +1,178 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: Implement Spark function `space` [datafusion]

2026-01-02 Thread via GitHub
andygrove commented on code in PR #19610: URL: https://github.com/apache/datafusion/pull/19610#discussion_r2657908538 ## datafusion/spark/src/function/string/space.rs: ## @@ -0,0 +1,178 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[PR] Fix typo in contributor guide architecture section [datafusion]

2026-01-02 Thread via GitHub
cdegroc opened a new pull request, #19613: URL: https://github.com/apache/datafusion/pull/19613 Fix a typo in the contributor guide architecture section. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Refactor `percentile_cont` to clarify support input types [datafusion]

2026-01-02 Thread via GitHub
comphead commented on code in PR #19611: URL: https://github.com/apache/datafusion/pull/19611#discussion_r2658019218 ## datafusion/functions-aggregate/src/percentile_cont.rs: ## @@ -297,76 +212,71 @@ impl AggregateUDFImpl for PercentileCont { ]) } -fn accumul

Re: [PR] Fix typo in contributor guide architecture section [datafusion]

2026-01-02 Thread via GitHub
comphead merged PR #19613: URL: https://github.com/apache/datafusion/pull/19613 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: Add microbenchmark for casting string to numeric [datafusion-comet]

2026-01-02 Thread via GitHub
comphead commented on code in PR #2979: URL: https://github.com/apache/datafusion-comet/pull/2979#discussion_r2658022078 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometCastStringToNumericBenchmark.scala: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] chore: Add microbenchmark for casting string to numeric [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove commented on code in PR #2979: URL: https://github.com/apache/datafusion-comet/pull/2979#discussion_r2658025267 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometCastStringToNumericBenchmark.scala: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software F

Re: [PR] chore: Add microbenchmark for casting string to numeric [datafusion-comet]

2026-01-02 Thread via GitHub
comphead commented on code in PR #2979: URL: https://github.com/apache/datafusion-comet/pull/2979#discussion_r2658024784 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometCastStringToNumericBenchmark.scala: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Add heap_size to statistics [datafusion]

2026-01-02 Thread via GitHub
mkleen commented on code in PR #19599: URL: https://github.com/apache/datafusion/pull/19599#discussion_r2658065984 ## datafusion/common/src/stats.rs: ## @@ -321,6 +321,13 @@ impl Statistics { } } +/// Returns the memory size in bytes. +pub fn heap_size(&s

Re: [PR] chore: Improve microbenchmark for string expressions [datafusion-comet]

2026-01-02 Thread via GitHub
comphead commented on code in PR #2964: URL: https://github.com/apache/datafusion-comet/pull/2964#discussion_r2658060782 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometStringExpressionBenchmark.scala: ## @@ -81,11 +74,22 @@ object CometStringExpressionBenchmark exte

Re: [PR] Add left function benchmark [datafusion]

2026-01-02 Thread via GitHub
viirya commented on PR #19600: URL: https://github.com/apache/datafusion/pull/19600#issuecomment-3705813140 Thanks @Jefffrey @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] AttachĀ `Diagnostic`Ā to "invalid function argument types" error [datafusion]

2026-01-02 Thread via GitHub
RSAgr commented on issue #14431: URL: https://github.com/apache/datafusion/issues/14431#issuecomment-3705808273 Has this issue been solved? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Regression: `DataFrameWriteOptions::with_single_file_output` produces a directory [datafusion]

2026-01-02 Thread via GitHub
RSAgr commented on issue #13323: URL: https://github.com/apache/datafusion/issues/13323#issuecomment-3705827522 Is someone working on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] chore: Skip some CI workflows for benchmark changes [datafusion-comet]

2026-01-02 Thread via GitHub
codecov-commenter commented on PR #3030: URL: https://github.com/apache/datafusion-comet/pull/3030#issuecomment-3706113816 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3030?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add heap_size to statistics [datafusion]

2026-01-02 Thread via GitHub
mkleen commented on code in PR #19599: URL: https://github.com/apache/datafusion/pull/19599#discussion_r2658422929 ## datafusion/common/src/stats.rs: ## @@ -321,6 +321,13 @@ impl Statistics { } } +/// Returns the memory size in bytes. +pub fn heap_size(&s

Re: [PR] Add heap_size to statistics [datafusion]

2026-01-02 Thread via GitHub
mkleen commented on code in PR #19599: URL: https://github.com/apache/datafusion/pull/19599#discussion_r2658432663 ## datafusion/common/src/stats.rs: ## @@ -321,6 +321,13 @@ impl Statistics { } } +/// Returns the memory size in bytes. +pub fn heap_size(&s

[I] Improve performance of first/last aggregates [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove opened a new issue, #3022: URL: https://github.com/apache/datafusion-comet/issues/3022 ### What is the problem the feature request solves? Comet is slower than Spark for `first` and `last` aggregates. Also, the behavior is not consistent with Spark (see https://github.com/

[PR] perf: Improve aggregate expression microbenchmarks [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove opened a new pull request, #3021: URL: https://github.com/apache/datafusion-comet/pull/3021 ## Which issue does this PR close? Closes #. ## Rationale for this change Implement new, more comprehensive benchmarks for aggregates. ## What chan

Re: [PR] feat: add `LogicalPlanBuilderExt` trait to move some DataFrame convenience methods into builder [datafusion]

2026-01-02 Thread via GitHub
adriangb commented on PR #19565: URL: https://github.com/apache/datafusion/pull/19565#issuecomment-3706470149 I think since it's in the same repo now real methods makes more sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] Add a way to create a new `FilterExec` directly with a projection [datafusion]

2026-01-02 Thread via GitHub
nuno-faria commented on issue #19608: URL: https://github.com/apache/datafusion/issues/19608#issuecomment-3706161261 @GaneshPatil7517 yes you can work on this. Just write a comment with "take" on this issue to be assigned. -- This is an automated message from the Apache Git Service. To re

[I] Comet writer to respect `object_store_settings` sent from Scala [datafusion-comet]

2026-01-02 Thread via GitHub
comphead opened a new issue, #3032: URL: https://github.com/apache/datafusion-comet/issues/3032 ### What is the problem the feature request solves? Comet writer gets created with default hadoop settings which works for test home distributed clusters, however it should also respect had

Re: [PR] [WIP] support PartialMerge [datafusion-comet]

2026-01-02 Thread via GitHub
codecov-commenter commented on PR #2918: URL: https://github.com/apache/datafusion-comet/pull/2918#issuecomment-3706168114 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2918?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] refactor: Use `Signature::coercible` for isnan/iszero/nanvl [datafusion]

2026-01-02 Thread via GitHub
martin-g commented on code in PR #19604: URL: https://github.com/apache/datafusion/pull/19604#discussion_r2658444311 ## datafusion/functions/src/math/nanvl.rs: ## Review Comment: Add an arm for Float16 ## datafusion/functions/src/math/nanvl.rs: ## @@ -97,

Re: [PR] perf: Improve aggregate expression microbenchmarks [datafusion-comet]

2026-01-02 Thread via GitHub
codecov-commenter commented on PR #3021: URL: https://github.com/apache/datafusion-comet/pull/3021#issuecomment-3706013846 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3021?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] Improve performance of `in_list` expressions [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove opened a new issue, #3027: URL: https://github.com/apache/datafusion-comet/issues/3027 ### What is the problem the feature request solves? From `CometComparisonExpressionBenchmark` (https://github.com/apache/datafusion-comet/pull/3026): ``` OpenJDK 64-Bit Server VM

Re: [PR] perf: Improve conditional expression microbenchmarks [datafusion-comet]

2026-01-02 Thread via GitHub
codecov-commenter commented on PR #3024: URL: https://github.com/apache/datafusion-comet/pull/3024#issuecomment-3706051445 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3024?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Improve performance of `sha` hashing expressions [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove commented on issue #3029: URL: https://github.com/apache/datafusion-comet/issues/3029#issuecomment-3706184217 This should help: https://github.com/apache/datafusion/pull/19586 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] chore(deps): bump taiki-e/install-action from 2.65.10 to 2.65.11 [datafusion]

2026-01-02 Thread via GitHub
comphead merged PR #19601: URL: https://github.com/apache/datafusion/pull/19601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: Skip some CI workflows for benchmark changes [datafusion-comet]

2026-01-02 Thread via GitHub
comphead commented on code in PR #3030: URL: https://github.com/apache/datafusion-comet/pull/3030#discussion_r2658408978 ## .github/workflows/pr_benchmark_check.yml: ## @@ -0,0 +1,85 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] Add heap_size to statistics [datafusion]

2026-01-02 Thread via GitHub
adriangb commented on code in PR #19599: URL: https://github.com/apache/datafusion/pull/19599#discussion_r2658407424 ## datafusion/common/src/stats.rs: ## @@ -321,6 +321,13 @@ impl Statistics { } } +/// Returns the memory size in bytes. +pub fn heap_size(

Re: [I] Add a way to create a new `FilterExec` directly with a projection [datafusion]

2026-01-02 Thread via GitHub
GaneshPatil7517 commented on issue #19608: URL: https://github.com/apache/datafusion/issues/19608#issuecomment-3706243425 Hi maintainers, I’d like to take this issue and start working on it. Thanks for the opportunity! -- This is an automated message from the Apache Git Service.

Re: [PR] perf: Optimize startsWith and endsWith string functions [datafusion-comet]

2026-01-02 Thread via GitHub
codecov-commenter commented on PR #3000: URL: https://github.com/apache/datafusion-comet/pull/3000#issuecomment-3706249630 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3000?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Add a way to create a new `FilterExec` directly with a projection [datafusion]

2026-01-02 Thread via GitHub
nuno-faria commented on issue #19608: URL: https://github.com/apache/datafusion/issues/19608#issuecomment-3706272664 @GaneshPatil7517 it must be a comment with just the word "take" on it, like this: https://github.com/user-attachments/assets/828300d1-bb3f-436b-bc97-b0631edad7dc"; />

Re: [PR] perf: Add microbenchmark for comparison expressions [datafusion-comet]

2026-01-02 Thread via GitHub
codecov-commenter commented on PR #3026: URL: https://github.com/apache/datafusion-comet/pull/3026#issuecomment-3706081225 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3026?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Add a way to create a new `FilterExec` directly with a projection [datafusion]

2026-01-02 Thread via GitHub
GaneshPatil7517 commented on issue #19608: URL: https://github.com/apache/datafusion/issues/19608#issuecomment-3706082302 @nuno-faria Can you assign this issue to me i want to work on this.. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] perf: Add microbenchmark for hash expressions [datafusion-comet]

2026-01-02 Thread via GitHub
codecov-commenter commented on PR #3028: URL: https://github.com/apache/datafusion-comet/pull/3028#issuecomment-3706086290 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3028?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] perf: Improve conditional expression microbenchmarks [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove opened a new pull request, #3024: URL: https://github.com/apache/datafusion-comet/pull/3024 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? Improvements Made

Re: [PR] perf: Improve conditional expression microbenchmarks [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove commented on PR #3024: URL: https://github.com/apache/datafusion-comet/pull/3024#issuecomment-3706023152 ``` OpenJDK 64-Bit Server VM 17.0.17+10-Ubuntu-122.04 on Linux 6.8.0-90-generic AMD Ryzen 9 7950X3D 16-Core Processor Case When Literal (3 branches): Best Tim

[I] Improve performance of conditional expressions [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove opened a new issue, #3025: URL: https://github.com/apache/datafusion-comet/issues/3025 ### What is the problem the feature request solves? Conditional expressions are slower when Comet is enabled. This will impact TPC-H and TPC-DS performance. ``` OpenJDK 64-Bit Se

[PR] perf: Add microbenchmark for comparison expressions [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove opened a new pull request, #3026: URL: https://github.com/apache/datafusion-comet/pull/3026 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] perf: Improve performance of ltrim, rtrim, btrim [datafusion]

2026-01-02 Thread via GitHub
andygrove commented on code in PR #19551: URL: https://github.com/apache/datafusion/pull/19551#discussion_r2658323654 ## datafusion/functions/src/string/common.rs: ## @@ -49,90 +49,69 @@ impl Display for TrimType { } } +/// Perform trim operation on input string with giv

Re: [PR] perf: Improve performance of `CaseExpr` with many branches and non-literal THEN expressions [WIP] [datafusion]

2026-01-02 Thread via GitHub
andygrove closed pull request #19588: perf: Improve performance of `CaseExpr` with many branches and non-literal THEN expressions [WIP] URL: https://github.com/apache/datafusion/pull/19588 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Add heap_size to statistics [datafusion]

2026-01-02 Thread via GitHub
adriangb commented on code in PR #19599: URL: https://github.com/apache/datafusion/pull/19599#discussion_r2658328311 ## datafusion/common/src/stats.rs: ## @@ -321,6 +321,13 @@ impl Statistics { } } +/// Returns the memory size in bytes. +pub fn heap_size(

Re: [PR] perf: optimize factorial function performance [datafusion]

2026-01-02 Thread via GitHub
comphead commented on PR #19575: URL: https://github.com/apache/datafusion/pull/19575#issuecomment-3706209720 @getChan please resolve conflicts -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] perf: optimize factorial function performance [datafusion]

2026-01-02 Thread via GitHub
Jefffrey commented on PR #19575: URL: https://github.com/apache/datafusion/pull/19575#issuecomment-3706596491 Thanks @getChan & @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] perf: optimize factorial function performance [datafusion]

2026-01-02 Thread via GitHub
Jefffrey merged PR #19575: URL: https://github.com/apache/datafusion/pull/19575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Accumulators which don't implement `retract_batch` can still exhibit buggy behaviour [datafusion]

2026-01-02 Thread via GitHub
Jefffrey commented on issue #19612: URL: https://github.com/apache/datafusion/issues/19612#issuecomment-3706599028 Feel free to comment `take` to assign this issue to yourself if you feel you can tackle this: https://datafusion.apache.org/contributor-guide/index.html#open-contribution-and-a

Re: [PR] perf: Improve performance of ltrim, rtrim, btrim [datafusion]

2026-01-02 Thread via GitHub
Jefffrey commented on PR #19551: URL: https://github.com/apache/datafusion/pull/19551#issuecomment-3706598528 Thanks @andygrove & @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Continue to improve performance of `trim` [datafusion]

2026-01-02 Thread via GitHub
Jefffrey closed issue #12576: Continue to improve performance of `trim` URL: https://github.com/apache/datafusion/issues/12576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] perf: Improve performance of ltrim, rtrim, btrim [datafusion]

2026-01-02 Thread via GitHub
Jefffrey merged PR #19551: URL: https://github.com/apache/datafusion/pull/19551 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] perf: Improve date/time microbenchmarks [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove opened a new pull request, #3020: URL: https://github.com/apache/datafusion-comet/pull/3020 ## Which issue does this PR close? Closes #. ## Rationale for this change Stop running redundant benchmarks (YEAR, YYY, YY are all equivalent and do not

Re: [PR] feat: Prune complex/nested predicates via statistics propagation [datafusion]

2026-01-02 Thread via GitHub
adriangb commented on code in PR #19609: URL: https://github.com/apache/datafusion/pull/19609#discussion_r2658117712 ## datafusion/physical-expr-common/src/physical_expr/pruning.rs: ## @@ -0,0 +1,539 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] perf: Improve performance of date truncate [datafusion-comet]

2026-01-02 Thread via GitHub
sqlbenchmark commented on PR #2997: URL: https://github.com/apache/datafusion-comet/pull/2997#issuecomment-3705910260 ## Comet Microbenchmark Results: CometDatetimeExpressionBenchmark **Commit:** `cddee7b` - feat: Improve performance of date truncate ### Benchmark Results

Re: [PR] perf: Improve performance of date truncate [datafusion-comet]

2026-01-02 Thread via GitHub
sqlbenchmark commented on PR #2997: URL: https://github.com/apache/datafusion-comet/pull/2997#issuecomment-3705959831 ## Comet Microbenchmark Results: CometDatetimeExpressionBenchmark **Commit:** `cddee7b` - feat: Improve performance of date truncate ### Benchmark Results

Re: [I] Automated way to run benchmarks on a dedicated machine from PRs [datafusion]

2026-01-02 Thread via GitHub
andygrove commented on issue #18115: URL: https://github.com/apache/datafusion/issues/18115#issuecomment-3705840574 I have been working on implementing automated benchmarks for Comet. The approach I am using does not use any ASF infrastructure or GitHub runners. It is a simple

Re: [PR] Add heap_size to statistics [datafusion]

2026-01-02 Thread via GitHub
adriangb commented on code in PR #19599: URL: https://github.com/apache/datafusion/pull/19599#discussion_r2658103826 ## datafusion/common/src/stats.rs: ## @@ -321,6 +321,13 @@ impl Statistics { } } +/// Returns the memory size in bytes. +pub fn heap_size(

Re: [PR] perf: Improve performance of date truncate [datafusion-comet]

2026-01-02 Thread via GitHub
sqlbenchmark commented on PR #2997: URL: https://github.com/apache/datafusion-comet/pull/2997#issuecomment-3705863733 ## Comet Microbenchmark Results: CometDatetimeExpressionBenchmark **Commit:** `cddee7b` - feat: Improve performance of date truncate ### Benchmark Results

Re: [PR] perf: Improve date/time microbenchmarks to avoid redundant/duplicate benchmarks [datafusion-comet]

2026-01-02 Thread via GitHub
codecov-commenter commented on PR #3020: URL: https://github.com/apache/datafusion-comet/pull/3020#issuecomment-3705919525 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3020?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] perf: Improve performance of normalize_nan [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove commented on PR #2999: URL: https://github.com/apache/datafusion-comet/pull/2999#issuecomment-3705922693 Thanks for the review @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] perf: Improve performance of normalize_nan [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove merged PR #2999: URL: https://github.com/apache/datafusion-comet/pull/2999 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Add config to disable columnar shuffle for complex types [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove commented on code in PR #2992: URL: https://github.com/apache/datafusion-comet/pull/2992#discussion_r2658172028 ## spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/CometShuffleExchangeExec.scala: ## @@ -403,23 +403,39 @@ object CometShuffleExchangeExec

Re: [PR] chore: Add microbenchmarks for all expressions [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove commented on PR #2984: URL: https://github.com/apache/datafusion-comet/pull/2984#issuecomment-3705953644 I am going to split this up into smaller PRs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] [EPIC] Optimize performance for slow expressions [datafusion-comet]

2026-01-02 Thread via GitHub
raushanprabhakar1 commented on issue #2986: URL: https://github.com/apache/datafusion-comet/issues/2986#issuecomment-3706849908 Hi @andygrove , is there any documentation to which i can refer for running the benchmark in my local device? I was working on the expression optimization, the bl

Re: [PR] Optimize `concat/concat_ws` scalar path by pre-allocating memory [datafusion]

2026-01-02 Thread via GitHub
Jefffrey commented on code in PR #19547: URL: https://github.com/apache/datafusion/pull/19547#discussion_r2658768935 ## datafusion/functions/src/string/concat.rs: ## @@ -206,7 +207,11 @@ impl ScalarUDFImpl for ConcatFunc { DataType::Utf8View => {

Re: [PR] perf: Improve string to int perf [datafusion-comet]

2026-01-02 Thread via GitHub
coderfender commented on PR #3017: URL: https://github.com/apache/datafusion-comet/pull/3017#issuecomment-3706874748 Proceeding with some rather unsafe options to see if we can squeeze in further optimizations -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Fix NULL handling in ScalarValue::partial_cmp (closes #19579) [datafusion]

2026-01-02 Thread via GitHub
Jefffrey commented on PR #19587: URL: https://github.com/apache/datafusion/pull/19587#issuecomment-3706873593 @Brijesh-Thakkar Please make sure you run the test suite before pushing and tagging for reviews. A simple `cargo test` still shows errors which conflicts with the PR body stating `A

Re: [PR] feat: Make GenericDistinctBuffer generic over both Hashable and native types [datafusion]

2026-01-02 Thread via GitHub
Jefffrey commented on code in PR #18763: URL: https://github.com/apache/datafusion/pull/18763#discussion_r2658717474 ## benchmarks/src/bin/dfbench.rs: ## @@ -34,17 +34,20 @@ static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc; static ALLOC: mimalloc::MiMalloc = mimalloc

Re: [PR] feat: Make GenericDistinctBuffer generic over both Hashable and native types [datafusion]

2026-01-02 Thread via GitHub
ShashidharM0118 commented on code in PR #18763: URL: https://github.com/apache/datafusion/pull/18763#discussion_r2658722434 ## benchmarks/src/bin/dfbench.rs: ## @@ -34,17 +34,20 @@ static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc; static ALLOC: mimalloc::MiMalloc = m

Re: [I] Accumulators which don't implement `retract_batch` can still exhibit buggy behaviour [datafusion]

2026-01-02 Thread via GitHub
GaneshPatil7517 commented on issue #19612: URL: https://github.com/apache/datafusion/issues/19612#issuecomment-3706839577 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] docs: Fix upgrade guide API examples for FileScanConfigBuilder and ParquetSource [datafusion]

2026-01-02 Thread via GitHub
adriangb commented on PR #19397: URL: https://github.com/apache/datafusion/pull/19397#issuecomment-3706689251 Apologies if I put docs in the wrong version and caused confusion. This was a big change and I'm sure I made mistakes. It seems like there's already a lot of discussion underway, an

Re: [PR] Optimize `concat/concat_ws` scalar path by pre-allocating memory [datafusion]

2026-01-02 Thread via GitHub
lyne7-sc commented on code in PR #19547: URL: https://github.com/apache/datafusion/pull/19547#discussion_r2658728261 ## datafusion/functions/src/string/concat.rs: ## @@ -206,7 +207,11 @@ impl ScalarUDFImpl for ConcatFunc { DataType::Utf8View => {

Re: [PR] feat: add support for schema-scoped table functions [datafusion]

2026-01-02 Thread via GitHub
Jefffrey commented on PR #18022: URL: https://github.com/apache/datafusion/pull/18022#issuecomment-3706869015 We do have this existing issue: - https://github.com/apache/datafusion/issues/15095 I think it would make more sense to pursue the above issue as adding this many metho

Re: [PR] perf: Improve string to int perf [datafusion-comet]

2026-01-02 Thread via GitHub
coderfender commented on PR #3017: URL: https://github.com/apache/datafusion-comet/pull/3017#issuecomment-3706870168 Results : ``` Running benchmark cast operation from : StringTyp

[I] Improve performance of `sha` hashing expressions [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove opened a new issue, #3029: URL: https://github.com/apache/datafusion-comet/issues/3029 ### What is the problem the feature request solves? ``` OpenJDK 64-Bit Server VM 17.0.17+10-Ubuntu-122.04 on Linux 6.8.0-90-generic AMD Ryzen 9 7950X3D 16-Core Processor sha1:

[PR] perf: Add microbenchmark for hash expressions [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove opened a new pull request, #3028: URL: https://github.com/apache/datafusion-comet/pull/3028 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

[PR] perf: Implement more microbenchmarks for cast expressions [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove opened a new pull request, #3031: URL: https://github.com/apache/datafusion-comet/pull/3031 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] Accumulators which don't implement `retract_batch` can still exhibit buggy behaviour [datafusion]

2026-01-02 Thread via GitHub
GaneshPatil7517 commented on issue #19612: URL: https://github.com/apache/datafusion/issues/19612#issuecomment-3706081034 @Jefffrey Can you assign this isssue to me i want to work on this... -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Add heap_size to statistics [datafusion]

2026-01-02 Thread via GitHub
adriangb commented on code in PR #19599: URL: https://github.com/apache/datafusion/pull/19599#discussion_r2658327838 ## datafusion/common/src/stats.rs: ## @@ -321,6 +321,13 @@ impl Statistics { } } +/// Returns the memory size in bytes. +pub fn heap_size(

Re: [PR] perf: Implement more microbenchmarks for cast expressions [datafusion-comet]

2026-01-02 Thread via GitHub
codecov-commenter commented on PR #3031: URL: https://github.com/apache/datafusion-comet/pull/3031#issuecomment-3706152694 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3031?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: Skip some CI workflows for benchmark changes [datafusion-comet]

2026-01-02 Thread via GitHub
andygrove commented on code in PR #3030: URL: https://github.com/apache/datafusion-comet/pull/3030#discussion_r2658459943 ## .github/workflows/pr_benchmark_check.yml: ## @@ -0,0 +1,85 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] feat: add `LogicalPlanBuilderExt` trait to move some DataFrame convenience methods into builder [datafusion]

2026-01-02 Thread via GitHub
abhiaagarwal commented on PR #19565: URL: https://github.com/apache/datafusion/pull/19565#issuecomment-3706453088 > > > Why a new trait if we already have `LogicalPlanBuilder`? > > > > > > I'm fine with adding it to `LogicalPlanBuilder`, just trying to be conservative > > M

[PR] chore: Remove deprecated `Sql` field from `ExecuteQueryParams.Query` [datafusion-ballista]

2026-01-02 Thread via GitHub
mattcuento opened a new pull request, #1360: URL: https://github.com/apache/datafusion-ballista/pull/1360 # Which issue does this PR close? Closes #1358. # Rationale for this change The `sql` field was marked as deprecated from the Ballista protocol two year

  1   2   3   >