Re: [PR] return absent stats when filters are pushed down [datafusion]

2024-09-23 Thread via GitHub
alamb commented on PR #12471: URL: https://github.com/apache/datafusion/pull/12471#issuecomment-2367797256 Thank you again @waruto210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] return absent stats when filters are pushed down [datafusion]

2024-09-23 Thread via GitHub
alamb merged PR #12471: URL: https://github.com/apache/datafusion/pull/12471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] TableScanExec return exact stats when it contain's filters [datafusion]

2024-09-23 Thread via GitHub
alamb closed issue #12416: TableScanExec return exact stats when it contain's filters URL: https://github.com/apache/datafusion/issues/12416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Minor: add new() function for ParquetReadOptions [datafusion]

2024-09-23 Thread via GitHub
alamb merged PR #12579: URL: https://github.com/apache/datafusion/pull/12579 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add `SessionStateBuilder::with_object_store` method [datafusion]

2024-09-23 Thread via GitHub
alamb merged PR #12578: URL: https://github.com/apache/datafusion/pull/12578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] make `Debug` for `MemoryExec` prettier [datafusion]

2024-09-23 Thread via GitHub
alamb merged PR #12582: URL: https://github.com/apache/datafusion/pull/12582 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add `SessionStateBuilder::with_object_store` method [datafusion]

2024-09-23 Thread via GitHub
alamb commented on PR #12578: URL: https://github.com/apache/datafusion/pull/12578#issuecomment-2367802774 Thanks again @OussamaSaoudi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] make `Debug` for `MemoryExec` prettier [datafusion]

2024-09-23 Thread via GitHub
alamb commented on PR #12582: URL: https://github.com/apache/datafusion/pull/12582#issuecomment-2367802085 Thanks @samuelcolvin and @lewiszlw If you want a prettier printing of MemoryExec, we could always consider implementing Display for it as well -- This is an automated messag

Re: [I] Support Register object stores via SessionStateBuilder [datafusion]

2024-09-23 Thread via GitHub
alamb closed issue #12553: Support Register object stores via SessionStateBuilder URL: https://github.com/apache/datafusion/issues/12553 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-23 Thread via GitHub
notfilippo commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1771148601 ## datafusion/expr-common/src/scalar.rs: ## @@ -0,0 +1,109 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-23 Thread via GitHub
notfilippo commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1771149183 ## datafusion/expr-common/src/scalar.rs: ## @@ -0,0 +1,109 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] return absent stats when filters are pushed down [datafusion]

2024-09-23 Thread via GitHub
Dandandan commented on code in PR #12471: URL: https://github.com/apache/datafusion/pull/12471#discussion_r1771166051 ## datafusion/core/src/datasource/physical_plan/parquet/mod.rs: ## @@ -741,7 +741,18 @@ impl ExecutionPlan for ParquetExec { } fn statistics(&self) -

Re: [I] ListingTableUrl should allow direct construction [datafusion]

2024-09-23 Thread via GitHub
alamb commented on issue #12581: URL: https://github.com/apache/datafusion/issues/12581#issuecomment-2367837308 I agree -- being able to take a direct `ListingTableUrl` seems like a good idea to me -- This is an automated message from the Apache Git Service. To respond to the message, ple

[PR] parquet: Add metrics on operations covered by `time_elapsed_opening` [datafusion]

2024-09-23 Thread via GitHub
progval opened a new pull request, #12585: URL: https://github.com/apache/datafusion/pull/12585 ## Which issue does this PR close? Closes #12584. ## Rationale for this change This allows both Datafusion developers and users to better measure the impact of their optimizat

[PR] Update prost-build requirement from =0.13.2 to =0.13.3 [datafusion]

2024-09-23 Thread via GitHub
dependabot[bot] opened a new pull request, #12587: URL: https://github.com/apache/datafusion/pull/12587 Updates the requirements on [prost-build](https://github.com/tokio-rs/prost) to permit the latest version. Changelog Sourced from https://github.com/tokio-rs/prost/blob/master/CH

Re: [PR] feat(function): add greatest function [datafusion]

2024-09-23 Thread via GitHub
rluvaton closed pull request #12474: feat(function): add greatest function URL: https://github.com/apache/datafusion/pull/12474 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Add Docs and Examples and helper methods to `PhysicalSortExpr` [datafusion]

2024-09-23 Thread via GitHub
alamb commented on code in PR #12589: URL: https://github.com/apache/datafusion/pull/12589#discussion_r1771353687 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1238,20 +1238,13 @@ mod tests { col("int_col").sort(false, true), ]]

[PR] Add Docs and Examples and helper methods to `PhysicalSortExpr` [datafusion]

2024-09-23 Thread via GitHub
alamb opened a new pull request, #12589: URL: https://github.com/apache/datafusion/pull/12589 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/12446 Closes #. ## Rationale for this change Basically while working on http

Re: [PR] Avoid RowConverter for multi column grouping [datafusion]

2024-09-23 Thread via GitHub
jayzhan211 commented on code in PR #12269: URL: https://github.com/apache/datafusion/pull/12269#discussion_r1771365793 ## datafusion/physical-expr-common/src/group_value_row.rs: ## @@ -0,0 +1,393 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [PR] Add `field` trait method to `WindowUDFImpl`, remove `return_type`/`nullable` [datafusion]

2024-09-23 Thread via GitHub
Blizzara commented on code in PR #12374: URL: https://github.com/apache/datafusion/pull/12374#discussion_r1771016885 ## datafusion/expr/src/udwf.rs: ## @@ -324,14 +317,8 @@ pub trait WindowUDFImpl: Debug + Send + Sync { hasher.finish() } -/// Allows customizi

[PR] physical-plan: Cast nested group values back to dictionary if necessary [datafusion]

2024-09-23 Thread via GitHub
brancz opened a new pull request, #12586: URL: https://github.com/apache/datafusion/pull/12586 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/12542 ## Rationale for this change Non-nested arrays were already being dictionary encoded

Re: [PR] Fix and Improve Sort Pushdown for Nested Loop and Hash Join [datafusion]

2024-09-23 Thread via GitHub
Dandandan commented on PR #12559: URL: https://github.com/apache/datafusion/pull/12559#issuecomment-2368258117 TY @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Fix and Improve Sort Pushdown for Nested Loop and Hash Join [datafusion]

2024-09-23 Thread via GitHub
Dandandan merged PR #12559: URL: https://github.com/apache/datafusion/pull/12559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Avoid RowConverter for multi column grouping [datafusion]

2024-09-23 Thread via GitHub
eejbyfeldt commented on code in PR #12269: URL: https://github.com/apache/datafusion/pull/12269#discussion_r1771340393 ## datafusion/physical-plan/src/aggregates/group_values/column_wise.rs: ## @@ -0,0 +1,315 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Improve documentation and add `Display` impl to `EquivalenceProperties` [datafusion]

2024-09-23 Thread via GitHub
alamb commented on code in PR #12590: URL: https://github.com/apache/datafusion/pull/12590#discussion_r1771477594 ## datafusion/physical-expr/src/equivalence/properties.rs: ## @@ -77,11 +87,39 @@ use itertools::Itertools; /// └---┴---┘ /// ``` /// -/// where columns `a` and `

Re: [PR] parquet: Add option to cache file metadata [datafusion]

2024-09-23 Thread via GitHub
progval commented on PR #12548: URL: https://github.com/apache/datafusion/pull/12548#issuecomment-2368656133 Closing in favor of #12593, which actually works and doesn't require extensive changes to Datafusion. -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] DataFrame parse_sql_expr does not handle aliases [datafusion]

2024-09-23 Thread via GitHub
Eason0729 commented on issue #12518: URL: https://github.com/apache/datafusion/issues/12518#issuecomment-2368633798 https://github.com/sqlparser-rs/sqlparser-rs/issues/1439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] DataFrame parse_sql_expr does not handle aliases [datafusion]

2024-09-23 Thread via GitHub
milenkovicm commented on issue #12518: URL: https://github.com/apache/datafusion/issues/12518#issuecomment-2368654213 I think you're right @Eason0729 I did not have much time to investigate but initial investigation lead to `sqlparser` -- This is an automated message from the Apache Git

[PR] parquet: Add support for user-provided metadata loaders [datafusion]

2024-09-23 Thread via GitHub
progval opened a new pull request, #12593: URL: https://github.com/apache/datafusion/pull/12593 ## Which issue does this PR close? Closes #12592. ## Rationale for this change This allows users to, for example, cache the Page Index so it does not need to be parsed every t

[I] parquet: Add support for user-provided metadata loaders [datafusion]

2024-09-23 Thread via GitHub
progval opened a new issue, #12592: URL: https://github.com/apache/datafusion/issues/12592 ### Is your feature request related to a problem or challenge? This allows users to implement #12547 (caching metadata, especially the Page Index) themselves without reimplementing all of `Parqu

[I] Add insert_or_update and get_payloads methods to binary_map/binary_view_map [datafusion]

2024-09-23 Thread via GitHub
dmitrybugakov opened a new issue, #12594: URL: https://github.com/apache/datafusion/issues/12594 ### Is your feature request related to a problem or challenge? Currently, in the [datafusion-functions-extra repository](https://github.com/datafusion-contrib/datafusion-functions-extra/bl

Re: [I] Add insert_or_update and get_payloads methods to binary_map/binary_view_map [datafusion]

2024-09-23 Thread via GitHub
dmitrybugakov commented on issue #12594: URL: https://github.com/apache/datafusion/issues/12594#issuecomment-2368692111 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Concise API to create DataFrame from collection [datafusion]

2024-09-23 Thread via GitHub
timsaucer commented on issue #12574: URL: https://github.com/apache/datafusion/issues/12574#issuecomment-2368671437 This is a great idea. We have some work in `datafusion-python` we might be able to reuse. -- This is an automated message from the Apache Git Service. To respond to the mess

[PR] Improve SanityChecker error message [datafusion]

2024-09-23 Thread via GitHub
alamb opened a new pull request, #12595: URL: https://github.com/apache/datafusion/pull/12595 Draft as it builds on https://github.com/apache/datafusion/pull/12590 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/12446 ## Rational

Re: [I] Implement fast min/max accumulator for binary / strings (now it uses the slower path) [datafusion]

2024-09-23 Thread via GitHub
alamb commented on issue #6906: URL: https://github.com/apache/datafusion/issues/6906#issuecomment-2368855897 > I think what @alamb means is that just simply using Vec to store the states will be at least not worse than StringArray + GroupsAccumulatorAdapter, and it is easy to start from.

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-23 Thread via GitHub
findepi commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1771799372 ## datafusion/expr-common/src/columnar_value.rs: ## @@ -89,7 +91,7 @@ pub enum ColumnarValue { /// Array of values Array(ArrayRef), /// A single val

Re: [PR] Add Docs and Examples and helper methods to `PhysicalSortExpr` [datafusion]

2024-09-23 Thread via GitHub
alamb merged PR #12589: URL: https://github.com/apache/datafusion/pull/12589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] parquet: Add option to cache file metadata [datafusion]

2024-09-23 Thread via GitHub
progval closed pull request #12548: parquet: Add option to cache file metadata URL: https://github.com/apache/datafusion/pull/12548 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Implement mode function [datafusion]

2024-09-23 Thread via GitHub
dmitrybugakov closed pull request #12385: Implement mode function URL: https://github.com/apache/datafusion/pull/12385 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [I] Support `mode` in Aggregation function [datafusion]

2024-09-23 Thread via GitHub
dmitrybugakov commented on issue #12248: URL: https://github.com/apache/datafusion/issues/12248#issuecomment-2368674526 Implemented: [mode.rs](https://github.com/datafusion-contrib/datafusion-functions-extra/blob/f89f200971bc054bb32d2d1f4a9923b622ff8a24/src/mode.rs#L48) -- This is an auto

Re: [PR] Implement mode function [datafusion]

2024-09-23 Thread via GitHub
dmitrybugakov commented on PR #12385: URL: https://github.com/apache/datafusion/pull/12385#issuecomment-2368675427 Implemented: [mode.rs](https://github.com/datafusion-contrib/datafusion-functions-extra/blob/f89f200971bc054bb32d2d1f4a9923b622ff8a24/src/mode.rs#L48) -- This is an automated

Re: [I] SanityChecker rejects certain valid `UNION` plans [datafusion]

2024-09-23 Thread via GitHub
alamb commented on issue #12446: URL: https://github.com/apache/datafusion/issues/12446#issuecomment-2368673265 > I have the same concern. We need to provide comprehensive tests. It would also be better if we observe the equivalence properties in the plans that include any UnionExec.

Re: [I] Potential regression in Schema / nullability calculations after upgrade to 42.0.0 [datafusion]

2024-09-23 Thread via GitHub
itsjunetime commented on issue #12560: URL: https://github.com/apache/datafusion/issues/12560#issuecomment-2368726110 I'm running into this behavior after #11989, specifically seeing schema mismatches where the only thing that is different is that a field's metadata disappears at some point

Re: [I] Write a blog post about implementing StringView in DataFusion [datafusion]

2024-09-23 Thread via GitHub
alamb commented on issue #11603: URL: https://github.com/apache/datafusion/issues/11603#issuecomment-2368773078 Still waiting on a committer to approve https://github.com/apache/datafusion-site/pull/25 and then I will close this ticket -- This is an automated message from the Apache Git

Re: [PR] fix: modulo op with negative zero divisor produces Nan [datafusion-comet]

2024-09-23 Thread via GitHub
kazuyukitanimura commented on PR #585: URL: https://github.com/apache/datafusion-comet/pull/585#issuecomment-2368763676 The main issue was fixed by https://github.com/apache/datafusion-comet/pull/953 The test cases from this PR would be still helpful -- This is an automated message fro

[I] Fusing partial aggregation with repartition [datafusion]

2024-09-23 Thread via GitHub
Rachelint opened a new issue, #12596: URL: https://github.com/apache/datafusion/issues/12596 ### Is your feature request related to a problem or challenge? I impl a poc https://github.com/apache/datafusion/pull/12526, and found this idea can actually improve performance. But fo

Re: [I] Fusing partial aggregation with repartition [datafusion]

2024-09-23 Thread via GitHub
Rachelint commented on issue #12596: URL: https://github.com/apache/datafusion/issues/12596#issuecomment-2368776335 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Fusing partial aggregation with repartition [datafusion]

2024-09-23 Thread via GitHub
Rachelint commented on issue #12596: URL: https://github.com/apache/datafusion/issues/12596#issuecomment-2368778083 I will push this forward after tasks listed in https://github.com/apache/datafusion/issues/11680#issuecomment-2368735093 finished -- This is an automated message from the A

Re: [I] Bump maturin version to satisfy conda-forge constraints? [datafusion-python]

2024-09-23 Thread via GitHub
Michael-J-Ward commented on issue #701: URL: https://github.com/apache/datafusion-python/issues/701#issuecomment-2368785873 Resolved by #725. The final arch migrator appears to be something different. https://github.com/conda-forge/datafusion-feedstock/pull/46#issuecomment-2368602285

Re: [I] Bump maturin version to satisfy conda-forge constraints? [datafusion-python]

2024-09-23 Thread via GitHub
Michael-J-Ward closed issue #701: Bump maturin version to satisfy conda-forge constraints? URL: https://github.com/apache/datafusion-python/issues/701 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Improve performance of high cardinality grouping by reusing hash values [datafusion]

2024-09-23 Thread via GitHub
Rachelint commented on issue #11680: URL: https://github.com/apache/datafusion/issues/11680#issuecomment-2368735093 I am sure that the performance improve now. But I think we should push this forward after: - #11943 merged, because this pr made big code changes too, may be clever to a

Re: [I] Fusing partial aggregation with repartition [datafusion]

2024-09-23 Thread via GitHub
Rachelint commented on issue #12596: URL: https://github.com/apache/datafusion/issues/12596#issuecomment-2368791465 @waynexia may be also interested about this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Update introduction.md [datafusion]

2024-09-23 Thread via GitHub
alamb commented on PR #12577: URL: https://github.com/apache/datafusion/pull/12577#issuecomment-2368877930 Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look -- This is an automated message from the A

Re: [PR] LexRequirement as a struct, instead of a type [datafusion]

2024-09-23 Thread via GitHub
alamb commented on code in PR #12583: URL: https://github.com/apache/datafusion/pull/12583#discussion_r1771810535 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -264,9 +267,55 @@ pub type LexOrdering = Vec; /// a reference to a lexicographical ordering. pub type Le

Re: [PR] fix: window function range offset should be long instead of int [datafusion-comet]

2024-09-23 Thread via GitHub
huaxingao commented on code in PR #733: URL: https://github.com/apache/datafusion-comet/pull/733#discussion_r1771810211 ## native/core/src/execution/datafusion/planner.rs: ## @@ -1692,16 +1692,33 @@ impl PhysicalPlanner { .and_then(|inner| inner.lower_frame_bound_st

Re: [PR] LexRequirement as a struct, instead of a type [datafusion]

2024-09-23 Thread via GitHub
alamb merged PR #12583: URL: https://github.com/apache/datafusion/pull/12583 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] making `LexRequirement` an actual struct [datafusion]

2024-09-23 Thread via GitHub
alamb closed issue #12255: making `LexRequirement` an actual struct URL: https://github.com/apache/datafusion/issues/12255 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Require `Debug` for `AnalyzerRule`, `FunctionRewriter`, and `OptimizerRule` [datafusion]

2024-09-23 Thread via GitHub
alamb merged PR #12556: URL: https://github.com/apache/datafusion/pull/12556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve documentation and add `Display` impl to `EquivalenceProperties` [datafusion]

2024-09-23 Thread via GitHub
alamb commented on PR #12590: URL: https://github.com/apache/datafusion/pull/12590#issuecomment-2368876359 FWY @wiedld -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Add Docs and Examples and helper methods to `PhysicalSortExpr` [datafusion]

2024-09-23 Thread via GitHub
alamb commented on PR #12589: URL: https://github.com/apache/datafusion/pull/12589#issuecomment-2368873660 Thanks for the quick review @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Warn instead of error for unused imports [datafusion]

2024-09-23 Thread via GitHub
alamb merged PR #12588: URL: https://github.com/apache/datafusion/pull/12588 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Require `Debug` for `AnalyzerRule`, `FunctionRewriter`, and `OptimizerRule` [datafusion]

2024-09-23 Thread via GitHub
alamb commented on PR #12556: URL: https://github.com/apache/datafusion/pull/12556#issuecomment-2368892765 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] fix: Correct results for grouping sets when columns contain nulls [datafusion]

2024-09-23 Thread via GitHub
alamb commented on PR #12571: URL: https://github.com/apache/datafusion/pull/12571#issuecomment-2368891896 Thank you @eejbyfeldt cc @thinkharderdev as I think you / your team implemented the `GROUPING SETS` implementation originally -- This is an automated message from the Apache

Re: [PR] Fix: check ambiguous column reference [datafusion]

2024-09-23 Thread via GitHub
eejbyfeldt commented on code in PR #12467: URL: https://github.com/apache/datafusion/pull/12467#discussion_r1771820737 ## datafusion/sqllogictest/test_files/join.slt: ## @@ -1209,3 +1209,20 @@ drop table t1; statement ok drop table t2; + +# Test SQLancer issue: https://githu

Re: [I] Any plan to support JSON or JSONB? [datafusion]

2024-09-23 Thread via GitHub
alamb commented on issue #7845: URL: https://github.com/apache/datafusion/issues/7845#issuecomment-2368900548 > somehow get filter pushdown / late materialization to work based on the result of a UDF so some columns aren't decompressed (or even aren't fetched) unless they're needed T

Re: [PR] Add `RuntimeEnv::try_new` and deprecate `RuntimeEnv::new` [datafusion]

2024-09-23 Thread via GitHub
alamb commented on PR #12566: URL: https://github.com/apache/datafusion/pull/12566#issuecomment-2368913958 > Thanks for the clear and beginner friendly issues @alamb! This was a great way to get introduced to the code, and I'm looking forward to contributing more :D You are welcome -

Re: [PR] docs: :memo: Add expected answers to `DataFrame` method examples [datafusion]

2024-09-23 Thread via GitHub
alamb commented on PR #12564: URL: https://github.com/apache/datafusion/pull/12564#issuecomment-2368913123 📝 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] return absent stats when filters are pushed down [datafusion]

2024-09-23 Thread via GitHub
alamb commented on code in PR #12471: URL: https://github.com/apache/datafusion/pull/12471#discussion_r1771831141 ## datafusion/core/src/datasource/physical_plan/parquet/mod.rs: ## @@ -741,7 +741,18 @@ impl ExecutionPlan for ParquetExec { } fn statistics(&self) -> Re

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-23 Thread via GitHub
alamb commented on code in PR #12395: URL: https://github.com/apache/datafusion/pull/12395#discussion_r1771853935 ## datafusion/functions/src/string/common.rs: ## @@ -72,65 +94,126 @@ pub(crate) fn general_trim( }; if use_string_view { -string_view_trim::(fun

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-23 Thread via GitHub
alamb commented on PR #12395: URL: https://github.com/apache/datafusion/pull/12395#issuecomment-2368977099 How about we merge this PR and then you can continue work on the optimizations as follow on PRs? -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] physical-plan: Cast nested group values back to dictionary if necessary [datafusion]

2024-09-23 Thread via GitHub
alamb commented on code in PR #12586: URL: https://github.com/apache/datafusion/pull/12586#discussion_r1771865616 ## datafusion/physical-plan/src/aggregates/group_values/row.rs: ## @@ -230,6 +231,11 @@ impl GroupValues for GroupValuesRows { } *a

Re: [PR] Support `skewness(x)` in Aggregation function [datafusion]

2024-09-23 Thread via GitHub
alamb commented on PR #12295: URL: https://github.com/apache/datafusion/pull/12295#issuecomment-2368995352 Perhaps we can port this function to https://github.com/datafusion-contrib/datafusion-functions-extra as well now that @dmitrybugakov has started that project -- This is an automat

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-23 Thread via GitHub
Rachelint commented on PR #12395: URL: https://github.com/apache/datafusion/pull/12395#issuecomment-2368999416 > How about we merge this PR and then you can continue work on the optimizations as follow on PRs? I am checking about https://github.com/apache/datafusion/pull/12395#discus

Re: [I] Implement fast min/max accumulator for binary / strings (now it uses the slower path) [datafusion]

2024-09-23 Thread via GitHub
devanbenz commented on issue #6906: URL: https://github.com/apache/datafusion/issues/6906#issuecomment-2369040854 > > I have had more time to take a look at this and sort of just wrap my head around how `GroupsAccumulatorAdapter` works a bit. I'm seeing that the performance impact is happen

Re: [PR] fix: window function range offset should be long instead of int [datafusion-comet]

2024-09-23 Thread via GitHub
viirya commented on code in PR #733: URL: https://github.com/apache/datafusion-comet/pull/733#discussion_r1771896989 ## native/core/src/execution/datafusion/planner.rs: ## @@ -1692,16 +1692,33 @@ impl PhysicalPlanner { .and_then(|inner| inner.lower_frame_bound_struc

[PR] chore: clarify tarball installation [datafusion-comet]

2024-09-23 Thread via GitHub
comphead opened a new pull request, #959: URL: https://github.com/apache/datafusion-comet/pull/959 ## Which issue does this PR close? Closes #. Describe installation steps from Published Source Release ## Rationale for this change ## What changes ar

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-23 Thread via GitHub
Rachelint commented on code in PR #12395: URL: https://github.com/apache/datafusion/pull/12395#discussion_r1771904266 ## datafusion/functions/src/string/common.rs: ## @@ -72,65 +94,126 @@ pub(crate) fn general_trim( }; if use_string_view { -string_view_trim::

Re: [PR] Add StringViewArray blogs on the DataFusion blog [datafusion-site]

2024-09-23 Thread via GitHub
alamb commented on PR #25: URL: https://github.com/apache/datafusion-site/pull/25#issuecomment-2369059695 Thank you so much @andygrove 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Write a blog post about implementing StringView in DataFusion [datafusion]

2024-09-23 Thread via GitHub
alamb closed issue #11603: Write a blog post about implementing StringView in DataFusion URL: https://github.com/apache/datafusion/issues/11603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-23 Thread via GitHub
alamb commented on PR #12395: URL: https://github.com/apache/datafusion/pull/12395#issuecomment-2369058367 > > How about we merge this PR and then you can continue work on the optimizations as follow on PRs? > > I am checking about [#12395 (comment)](https://github.com/apache/datafus

Re: [PR] Add StringViewArray blogs on the DataFusion blog [datafusion-site]

2024-09-23 Thread via GitHub
alamb merged PR #25: URL: https://github.com/apache/datafusion-site/pull/25 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusio

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-23 Thread via GitHub
Rachelint commented on PR #12395: URL: https://github.com/apache/datafusion/pull/12395#issuecomment-2369068388 > > > How about we merge this PR and then you can continue work on the optimizations as follow on PRs? > > > > > > I am checking about [#12395 (comment)](https://github.

Re: [PR] LexRequirement as a struct, instead of a type [datafusion]

2024-09-23 Thread via GitHub
berkaysynnada commented on code in PR #12583: URL: https://github.com/apache/datafusion/pull/12583#discussion_r1770904821 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -264,9 +267,55 @@ pub type LexOrdering = Vec; /// a reference to a lexicographical ordering. pub

Re: [PR] LexRequirement as a struct, instead of a type [datafusion]

2024-09-23 Thread via GitHub
berkaysynnada commented on code in PR #12583: URL: https://github.com/apache/datafusion/pull/12583#discussion_r1770904821 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -264,9 +267,55 @@ pub type LexOrdering = Vec; /// a reference to a lexicographical ordering. pub

Re: [PR] Add IMDB(JOB) Benchmark [2/N] (imdb queries) [datafusion]

2024-09-23 Thread via GitHub
austin362667 commented on code in PR #12529: URL: https://github.com/apache/datafusion/pull/12529#discussion_r1771479073 ## benchmarks/src/imdb/convert.rs: ## @@ -0,0 +1,112 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreeme

Re: [PR] Add IMDB(JOB) Benchmark [2/N] (imdb queries) [datafusion]

2024-09-23 Thread via GitHub
austin362667 commented on code in PR #12529: URL: https://github.com/apache/datafusion/pull/12529#discussion_r1771489759 ## benchmarks/src/imdb/mod.rs: ## @@ -0,0 +1,236 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] LexRequirement as a struct, instead of a type [datafusion]

2024-09-23 Thread via GitHub
ngli-me commented on code in PR #12583: URL: https://github.com/apache/datafusion/pull/12583#discussion_r1771548754 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -264,9 +267,55 @@ pub type LexOrdering = Vec; /// a reference to a lexicographical ordering. pub type

[I] Making `LexOrderingRequest` an actual struct [datafusion]

2024-09-23 Thread via GitHub
ngli-me opened a new issue, #12591: URL: https://github.com/apache/datafusion/issues/12591 ### Is your feature request related to a problem or challenge? > Perhaps we could follow a similar pattern for `LexOrderingRef`, too. _Originally posted by @berkaysynnada in https://githu

Re: [PR] Avoid RowConverter for multi column grouping [datafusion]

2024-09-23 Thread via GitHub
jayzhan211 commented on PR #12269: URL: https://github.com/apache/datafusion/pull/12269#issuecomment-2368451062 > One idea I had is that you could defer actually copying the new rows into group_values so rather than calling the function once for each new group, you could call it once per ba

Re: [PR] LexRequirement as a struct, instead of a type [datafusion]

2024-09-23 Thread via GitHub
ngli-me commented on PR #12583: URL: https://github.com/apache/datafusion/pull/12583#issuecomment-2368452049 Made some adjustments, since there was a merge conflict :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] SanityChecker rejects certain valid `UNION` plans [datafusion]

2024-09-23 Thread via GitHub
alamb commented on issue #12446: URL: https://github.com/apache/datafusion/issues/12446#issuecomment-2368528145 An update here is I have a WIP PR with tests, etc. https://github.com/apache/datafusion/pull/12562 I am now trying to figure out an appropriate algorithm to implement the

Re: [PR] Warn instead of error for unused imports [datafusion]

2024-09-23 Thread via GitHub
samuelcolvin commented on PR #12588: URL: https://github.com/apache/datafusion/pull/12588#issuecomment-2368078730 See https://github.com/samuelcolvin/datafusion/pull/2 - confirming that CI still fails on unused imports. -- This is an automated message from the Apache Git Service. To respo

[PR] Improve documentation and add `Display` impl to `EquivalenceProperties` [datafusion]

2024-09-23 Thread via GitHub
alamb opened a new pull request, #12590: URL: https://github.com/apache/datafusion/pull/12590 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/12446 Closes #. ## Rationale for this change Basically while working on https://

Re: [I] Implement fast min/max accumulator for binary / strings (now it uses the slower path) [datafusion]

2024-09-23 Thread via GitHub
Rachelint commented on issue #6906: URL: https://github.com/apache/datafusion/issues/6906#issuecomment-2368191429 > > @Rachelint your implicit idea of using `Vec` to store the state I think is actually quite interesting and maybe we should try that one first: > > It would at least avoid

Re: [I] Implement fast min/max accumulator for binary / strings (now it uses the slower path) [datafusion]

2024-09-23 Thread via GitHub
Rachelint commented on issue #6906: URL: https://github.com/apache/datafusion/issues/6906#issuecomment-2368191455 > > @Rachelint your implicit idea of using `Vec` to store the state I think is actually quite interesting and maybe we should try that one first: > > It would at least avoid

Re: [PR] Improve documentation and add `Display` impl to `EquivalenceProperties` [datafusion]

2024-09-23 Thread via GitHub
alamb commented on code in PR #12590: URL: https://github.com/apache/datafusion/pull/12590#discussion_r1771415270 ## datafusion/physical-expr/src/equivalence/properties.rs: ## @@ -77,11 +87,39 @@ use itertools::Itertools; /// └---┴---┘ /// ``` /// -/// where columns `a` and `

[I] parquet: Refine `time_elapsed_opening` metric [datafusion]

2024-09-23 Thread via GitHub
progval opened a new issue, #12584: URL: https://github.com/apache/datafusion/issues/12584 ### Is your feature request related to a problem or challenge? For Parquet files, the `time_elapsed_opening` metric encompasses all of (in execution order): 1. Reading and parsing the foo

[PR] No error unused imports [datafusion]

2024-09-23 Thread via GitHub
samuelcolvin opened a new pull request, #12588: URL: https://github.com/apache/datafusion/pull/12588 ## Rationale for this change It's very frustrating when working on datafusion that unused imports cause scripts and tests to fail at build time, and not run. Also, while looking

Re: [I] SanityChecker rejects certain valid `UNION` plans [datafusion]

2024-09-23 Thread via GitHub
berkaysynnada commented on issue #12446: URL: https://github.com/apache/datafusion/issues/12446#issuecomment-2368579422 > An update here is I have a WIP PR with tests, etc. https://github.com/apache/datafusion/pull/12562 I will share my thoughts on it tomorrow > I worry that if

Re: [I] DataFrame parse_sql_expr does not handle aliases [datafusion]

2024-09-23 Thread via GitHub
Eason0729 commented on issue #12518: URL: https://github.com/apache/datafusion/issues/12518#issuecomment-2368595027 I spent some time on tracing code but found unable to fix it. Maybe we need to change some code in `sqlparser`, correct me if I am wrong. Detail To be specific,

  1   2   >