Re: [PR] feat: Support CartesianProductExec in comet [datafusion-comet]

2024-07-06 Thread via GitHub
leoluan2009 closed pull request #442: feat: Support CartesianProductExec in comet URL: https://github.com/apache/datafusion-comet/pull/442 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] feat: Support spark base64 function [datafusion-comet]

2024-07-06 Thread via GitHub
leoluan2009 closed pull request #420: feat: Support spark base64 function URL: https://github.com/apache/datafusion-comet/pull/420 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] feat: Support spark unbase64 function [datafusion-comet]

2024-07-06 Thread via GitHub
leoluan2009 closed pull request #425: feat: Support spark unbase64 function URL: https://github.com/apache/datafusion-comet/pull/425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Implement a scalar function for creating ScalarValue::Map [datafusion]

2024-07-06 Thread via GitHub
jayzhan211 commented on issue #11268: URL: https://github.com/apache/datafusion/issues/11268#issuecomment-2211708702 > > The datatype of key and value should be known before `invoke`, so we can get the corresponding Builder based on the data type > > Indeed. I'll try to use MapBuilder

Re: [PR] Improve volatile expression handling in `CommonSubexprEliminate` [datafusion]

2024-07-06 Thread via GitHub
peter-toth commented on PR #11265: URL: https://github.com/apache/datafusion/pull/11265#issuecomment-2211715751 I've added CSE tests in https://github.com/apache/datafusion/pull/11265/commits/1aa88486775266570c7219d490201eb96a8dfc4d. -- This is an automated message from the Apache Git Ser

Re: [PR] Convert `nth_value` to UDAF [datafusion]

2024-07-06 Thread via GitHub
jcsherin commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667339145 ## datafusion/sql/src/expr/function.rs: ## @@ -415,9 +415,11 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { ) -> Result { // check udaf first

Re: [PR] Convert `nth_value` to UDAF [datafusion]

2024-07-06 Thread via GitHub
jcsherin commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667339902 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -362,15 +363,17 @@ fn rountrip_aggregate() -> Result<()> { false, )?],

Re: [PR] Convert `nth_value` to UDAF [datafusion]

2024-07-06 Thread via GitHub
jcsherin commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667340445 ## datafusion/functions-aggregate/src/nth_value.rs: ## @@ -430,3 +428,176 @@ impl NthValueAccumulator { Ok(()) } } + +/// This is a wrapper struct

Re: [PR] Convert `nth_value` to UDAF [datafusion]

2024-07-06 Thread via GitHub
jcsherin commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667340445 ## datafusion/functions-aggregate/src/nth_value.rs: ## @@ -430,3 +428,176 @@ impl NthValueAccumulator { Ok(()) } } + +/// This is a wrapper struct

Re: [PR] Convert `nth_value` to UDAF [datafusion]

2024-07-06 Thread via GitHub
jcsherin commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667342415 ## datafusion/functions-aggregate/src/nth_value.rs: ## @@ -19,152 +19,150 @@ //! that can evaluated at runtime during query execution use std::any::Any; -use

Re: [PR] Convert `nth_value` to UDAF [datafusion]

2024-07-06 Thread via GitHub
jcsherin commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667342415 ## datafusion/functions-aggregate/src/nth_value.rs: ## @@ -19,152 +19,150 @@ //! that can evaluated at runtime during query execution use std::any::Any; -use

Re: [I] Error when building user guide: UndefinedError("'logo' is undefined") [datafusion]

2024-07-06 Thread via GitHub
alamb commented on issue #5597: URL: https://github.com/apache/datafusion/issues/5597#issuecomment-2211729772 I hvaen't seen this happen recently: ```shell (venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2/docs$ ./build.sh Running Sphinx v7.2.6 making output dire

Re: [I] Error when building user guide: UndefinedError("'logo' is undefined") [datafusion]

2024-07-06 Thread via GitHub
alamb closed issue #5597: Error when building user guide: UndefinedError("'logo' is undefined") URL: https://github.com/apache/datafusion/issues/5597 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Convert `nth_value` to UDAF [datafusion]

2024-07-06 Thread via GitHub
jcsherin commented on PR #11287: URL: https://github.com/apache/datafusion/pull/11287#issuecomment-2211730373 Summary: - Marked as TODO: make the nullability of list field element configurable. This can be completed in a follow up PR after #11063. - Extract `merge_ordered_arrays`

Re: [PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-06 Thread via GitHub
alamb commented on code in PR #11290: URL: https://github.com/apache/datafusion/pull/11290#discussion_r1667067780 ## docs/source/library-user-guide/using-the-sql-api.md: ## @@ -29,16 +29,15 @@ using the [`SessionContext::sql`] method. For lower level control such as preventing

Re: [PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-06 Thread via GitHub
alamb commented on code in PR #11290: URL: https://github.com/apache/datafusion/pull/11290#discussion_r1667067780 ## docs/source/library-user-guide/using-the-sql-api.md: ## @@ -29,16 +29,15 @@ using the [`SessionContext::sql`] method. For lower level control such as preventing

Re: [I] Implement a scalar function for creating ScalarValue::Map [datafusion]

2024-07-06 Thread via GitHub
goldmedal commented on issue #11268: URL: https://github.com/apache/datafusion/issues/11268#issuecomment-2211734625 > btw, there is probably a downside for using MapBuilder, since we need to create different builder for different types (therefore many macros) which easily causes code bloat.

[PR] Move configuration information out of example usage page [datafusion]

2024-07-06 Thread via GitHub
alamb opened a new pull request, #11300: URL: https://github.com/apache/datafusion/pull/11300 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/11172 ## Rationale for this change The current example usage page https://datafusion.

Re: [PR] Move configuration information out of example usage page [datafusion]

2024-07-06 Thread via GitHub
alamb commented on code in PR #11300: URL: https://github.com/apache/datafusion/pull/11300#discussion_r1667348573 ## docs/source/index.rst: ## @@ -41,13 +41,16 @@ DataFusion offers SQL and Dataframe APIs, excellent CSV, Parquet, JSON, and Avro, extensive customization, and a gr

Re: [PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-06 Thread via GitHub
alamb commented on code in PR #11290: URL: https://github.com/apache/datafusion/pull/11290#discussion_r1667344514 ## docs/source/library-user-guide/using-the-dataframe-api.md: ## @@ -19,129 +19,236 @@ # Using the DataFrame API -## What is a DataFrame +## What is a DataFrame

[PR] AggregateExec: Take grouping sets into account for InputOrderMode [datafusion]

2024-07-06 Thread via GitHub
thinkharderdev opened a new pull request, #11301: URL: https://github.com/apache/datafusion/pull/11301 ## Which issue does this PR close? Closes #11291 ## Rationale for this change Fixes incorrect `InputOrderMode` when aggregation has grouping sets

[PR] Upgrade to arrow 52.1.0 (and fix clippy issues on main) [datafusion]

2024-07-06 Thread via GitHub
alamb opened a new pull request, #11302: URL: https://github.com/apache/datafusion/pull/11302 ## Which issue does this PR close? N/A ## Rationale for this change https://crates.io/crates/arrow/52.1.0 was released an hour ago and causes clippy to start failing ## W

Re: [PR] Fix count() docs around including null values [datafusion]

2024-07-06 Thread via GitHub
alamb merged PR #11293: URL: https://github.com/apache/datafusion/pull/11293 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Remove unnecessary qualified names [datafusion]

2024-07-06 Thread via GitHub
alamb merged PR #11292: URL: https://github.com/apache/datafusion/pull/11292 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: correctly handle Substrait windows with rows bounds (and validate executability of test plans) [datafusion]

2024-07-06 Thread via GitHub
alamb commented on PR #11278: URL: https://github.com/apache/datafusion/pull/11278#issuecomment-2211743489 Thanks again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] fix: correctly handle Substrait windows with rows bounds (and validate executability of test plans) [datafusion]

2024-07-06 Thread via GitHub
alamb merged PR #11278: URL: https://github.com/apache/datafusion/pull/11278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix running examples readme [datafusion]

2024-07-06 Thread via GitHub
alamb commented on PR #11225: URL: https://github.com/apache/datafusion/pull/11225#issuecomment-2211743242 Thanks again @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Fix running examples readme [datafusion]

2024-07-06 Thread via GitHub
alamb merged PR #11225: URL: https://github.com/apache/datafusion/pull/11225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: enable "substring" as a UDF in addition to "substr" [datafusion]

2024-07-06 Thread via GitHub
alamb merged PR #11277: URL: https://github.com/apache/datafusion/pull/11277 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] HashJoin can preserve the right ordering when join type is Right [datafusion]

2024-07-06 Thread via GitHub
alamb commented on PR #11276: URL: https://github.com/apache/datafusion/pull/11276#issuecomment-2211743850 This makes sense to me -- perhaps someone else who is familiar with the join code like @viirya @comphead or @korowa could give this a double check to verify that hash join does pres

Re: [PR] Minor: Add `ConstExpr::from` and use in physical optimizer [datafusion]

2024-07-06 Thread via GitHub
alamb commented on PR #11283: URL: https://github.com/apache/datafusion/pull/11283#issuecomment-2211743628 Thank you for the review @mustafasrepo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Minor: Add `ConstExpr::from` and use in physical optimizer [datafusion]

2024-07-06 Thread via GitHub
alamb merged PR #11283: URL: https://github.com/apache/datafusion/pull/11283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] AggregateExec: Take grouping sets into account for InputOrderMode [datafusion]

2024-07-06 Thread via GitHub
alamb commented on PR #11301: URL: https://github.com/apache/datafusion/pull/11301#issuecomment-2211744019 I think clippy is failing on this PR due to https://github.com/apache/datafusion/pull/11302 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] AggregateExec: Take grouping sets into account for InputOrderMode [datafusion]

2024-07-06 Thread via GitHub
alamb commented on code in PR #11301: URL: https://github.com/apache/datafusion/pull/11301#discussion_r1667352742 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -369,14 +369,21 @@ impl AggregateExec { new_requirement.extend(req); new_requirement = col

Re: [PR] Implement TPCH substrait integration teset, support tpch_3 [datafusion]

2024-07-06 Thread via GitHub
alamb merged PR #11298: URL: https://github.com/apache/datafusion/pull/11298 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] use safe cast in propagate_constraints [datafusion]

2024-07-06 Thread via GitHub
alamb commented on code in PR #11297: URL: https://github.com/apache/datafusion/pull/11297#discussion_r1667353718 ## datafusion/sqllogictest/test_files/cast.slt: ## @@ -69,3 +69,16 @@ query ? SELECT CAST(MAKE_ARRAY() AS VARCHAR[]) [] + +statement ok +create table t0(v0 B

[I] Review the behavior of `count` with multiple arguments [datafusion]

2024-07-06 Thread via GitHub
jonahgao opened a new issue, #11303: URL: https://github.com/apache/datafusion/issues/11303 ### Is your feature request related to a problem or challenge? Datafusion supports this type of syntax, such as `count(a, b)` and `count(distinct a, b)`. However, its behavior may not be wel

Re: [PR] Infer count() aggregation is not null [datafusion]

2024-07-06 Thread via GitHub
jonahgao commented on code in PR #11256: URL: https://github.com/apache/datafusion/pull/11256#discussion_r1667354250 ## datafusion/expr/src/expr_schema.rs: ## @@ -322,10 +322,16 @@ impl ExprSchemable for Expr { } } Expr::Cast(Cast { exp

[I] Consider renaming `UserDefinedSQLPlanner` to `ExprPlanner` [datafusion]

2024-07-06 Thread via GitHub
alamb opened a new issue, #11304: URL: https://github.com/apache/datafusion/issues/11304 ### Is your feature request related to a problem or challenge? @samuelcolvin notes on https://github.com/apache/datafusion/issues/11207 https://github.com/apache/datafusion/issues/11207#issuecomm

Re: [I] [Epic] Complete pulling out special SQL planning from the Sql Parser [datafusion]

2024-07-06 Thread via GitHub
alamb commented on issue #11207: URL: https://github.com/apache/datafusion/issues/11207#issuecomment-2211746833 > `ExprPlanner` sounds good. Filed https://github.com/apache/datafusion/issues/11304 -- This is an automated message from the Apache Git Service. To respond to the mes

[I] @jayzhan211 do you suggest we have one planner for each module (like a `UnicodePlanner` rather than a `PositionPlanner`? [datafusion]

2024-07-06 Thread via GitHub
alamb opened a new issue, #11305: URL: https://github.com/apache/datafusion/issues/11305 @jayzhan211 do you suggest we have one planner for each module (like a `UnicodePlanner` rather than a `PositionPlanner`? I think this is similar to what you are proposing with `Co

Re: [PR] Implement user defined planner for position [datafusion]

2024-07-06 Thread via GitHub
alamb merged PR #11243: URL: https://github.com/apache/datafusion/pull/11243 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Implement user defined planner for sql_position_to_expr [datafusion]

2024-07-06 Thread via GitHub
alamb closed issue #11242: Implement user defined planner for sql_position_to_expr URL: https://github.com/apache/datafusion/issues/11242 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Implement user defined planner for position [datafusion]

2024-07-06 Thread via GitHub
alamb commented on code in PR #11243: URL: https://github.com/apache/datafusion/pull/11243#discussion_r1667355725 ## datafusion/functions/src/unicode/planner.rs: ## @@ -0,0 +1,36 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] Implement user defined planner for position [datafusion]

2024-07-06 Thread via GitHub
alamb commented on PR #11243: URL: https://github.com/apache/datafusion/pull/11243#issuecomment-2211748531 🚀 let's consolidate the planners as a follow on PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Upgrade to arrow 52.1.0 (and fix clippy issues on main) [datafusion]

2024-07-06 Thread via GitHub
andygrove merged PR #11302: URL: https://github.com/apache/datafusion/pull/11302 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] WIP Upgrade to arrow/parquet `52.1.0` [datafusion]

2024-07-06 Thread via GitHub
andygrove closed pull request #11202: WIP Upgrade to arrow/parquet `52.1.0` URL: https://github.com/apache/datafusion/pull/11202 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Implement a scalar function for creating ScalarValue::Map [datafusion]

2024-07-06 Thread via GitHub
jayzhan211 commented on issue #11268: URL: https://github.com/apache/datafusion/issues/11268#issuecomment-2211754053 > > btw, there is probably a downside for using MapBuilder, since we need to create different builder for different types (therefore many macros) which easily causes code blo

[PR] chore: Convert Rust project into a workspace [datafusion-comet]

2024-07-06 Thread via GitHub
andygrove opened a new pull request, #637: URL: https://github.com/apache/datafusion-comet/pull/637 ## Which issue does this PR close? This is a first step towards splitting the Comet code into multiple crates, such as a `datafusion-spark-expr` crate as suggested in https://g

Re: [PR] Add user_defined_sql_planners(..) to FunctionRegistry [datafusion]

2024-07-06 Thread via GitHub
Omega359 commented on PR #11296: URL: https://github.com/apache/datafusion/pull/11296#issuecomment-2211773493 I'm ok with that as long as the register function neme (and I suppose deregister in the future) mirrors the naming scheme. I can file a followup issue to have that renamed. -- T

Re: [PR] chore: Convert Rust project into a workspace [datafusion-comet]

2024-07-06 Thread via GitHub
codecov-commenter commented on PR #637: URL: https://github.com/apache/datafusion-comet/pull/637#issuecomment-2211781462 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/637?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

[I] "Filter predicates should not be aliased." seems to strict [datafusion]

2024-07-06 Thread via GitHub
samuelcolvin opened a new issue, #11306: URL: https://github.com/apache/datafusion/issues/11306 ### Describe the bug Aliases are not allowed as filter predicates. As per https://github.com/datafusion-contrib/datafusion-functions-json/pull/26#discussion_r1664566127 @alamb sugge

Re: [I] Review the behavior of `count` with multiple arguments [datafusion]

2024-07-06 Thread via GitHub
findepi commented on issue #11303: URL: https://github.com/apache/datafusion/issues/11303#issuecomment-2211795107 Related to this, we could maybe support zero-arg `count()`. For 2+ args for count, I as a user would prefer to used filtered aggregation. It's then obvious whether I am co

Re: [PR] Remove unnecessary qualified names [datafusion]

2024-07-06 Thread via GitHub
findepi commented on PR #11292: URL: https://github.com/apache/datafusion/pull/11292#issuecomment-2211795858 Thanks for review and merge, @alamb ! RustRover has nice inspections and can detect unnecessary use of qualified names. Sadly it can't fix them automatically (yet), so it's curr

Re: [PR] Fix count() docs around including null values [datafusion]

2024-07-06 Thread via GitHub
findepi commented on PR #11293: URL: https://github.com/apache/datafusion/pull/11293#issuecomment-2211796154 thanks for keeping me honest! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Change array agg result from empty list to null if no row qualifed [datafusion]

2024-07-06 Thread via GitHub
findepi commented on code in PR #11299: URL: https://github.com/apache/datafusion/pull/11299#discussion_r1667382661 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -2744,28 +2735,98 @@ SELECT ARRAY_AGG([1]) [[1]] -# test_approx_percentile_cont_decimal_support

[PR] allow alias in predicate [datafusion]

2024-07-06 Thread via GitHub
samuelcolvin opened a new pull request, #11307: URL: https://github.com/apache/datafusion/pull/11307 ## Which issue does this PR close? fix #11306. ## Rationale for this change See https://github.com/datafusion-contrib/datafusion-functions-json/pull/26 and #11306

Re: [PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-06 Thread via GitHub
efredine commented on code in PR #11290: URL: https://github.com/apache/datafusion/pull/11290#discussion_r1667386605 ## docs/source/library-user-guide/using-the-dataframe-api.md: ## @@ -19,129 +19,236 @@ # Using the DataFrame API -## What is a DataFrame +## What is a DataFr

Re: [PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-06 Thread via GitHub
efredine commented on code in PR #11290: URL: https://github.com/apache/datafusion/pull/11290#discussion_r1667387330 ## docs/source/library-user-guide/using-the-dataframe-api.md: ## @@ -19,129 +19,236 @@ # Using the DataFrame API -## What is a DataFrame +## What is a DataFr

Re: [PR] Tsaucer/find window fn [datafusion-python]

2024-07-06 Thread via GitHub
andygrove merged PR #747: URL: https://github.com/apache/datafusion-python/pull/747 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Add user_defined_sql_planners(..) to FunctionRegistry [datafusion]

2024-07-06 Thread via GitHub
Omega359 commented on PR #11296: URL: https://github.com/apache/datafusion/pull/11296#issuecomment-2211816012 Rename update pushed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] use safe cast in propagate_constraints [datafusion]

2024-07-06 Thread via GitHub
Lordworms commented on code in PR #11297: URL: https://github.com/apache/datafusion/pull/11297#discussion_r1667393385 ## datafusion/sqllogictest/test_files/cast.slt: ## @@ -69,3 +69,16 @@ query ? SELECT CAST(MAKE_ARRAY() AS VARCHAR[]) [] + +statement ok +create table t0(

Re: [I] Implement a scalar function for creating ScalarValue::Map [datafusion]

2024-07-06 Thread via GitHub
goldmedal commented on issue #11268: URL: https://github.com/apache/datafusion/issues/11268#issuecomment-2211819196 > `make_map_batch` version is way faster, we should go with that one. > > Upd: I tried to remove clone in MapBuilder version, it still seems slower than manually constru

[I] Generate well-indented SQL from LogicalPlan [datafusion]

2024-07-06 Thread via GitHub
edmondop opened a new issue, #11308: URL: https://github.com/apache/datafusion/issues/11308 ### Is your feature request related to a problem or challenge? DataFusion provides the capability of "unparse" a logical plan into SQL via the `unparser` module in the `sql` crate (see https:/

Re: [I] Improve performance of DataPage statistics extraction using StringBuilder [datafusion]

2024-07-06 Thread via GitHub
Rachelint commented on issue #11281: URL: https://github.com/apache/datafusion/issues/11281#issuecomment-2211827993 Seems too many tmp `Vec` created, maybe can eliminate them too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Implement user defined planner for `create_struct` & `create_named_struct` [datafusion]

2024-07-06 Thread via GitHub
dharanad commented on code in PR #11273: URL: https://github.com/apache/datafusion/pull/11273#discussion_r1667401047 ## datafusion/expr/src/planner.rs: ## @@ -133,6 +133,17 @@ pub trait UserDefinedSQLPlanner: Send + Sync { fn plan_extract(&self, args: Vec) -> Result>> {

Re: [PR] Implement user defined planner for `create_struct` & `create_named_struct` [datafusion]

2024-07-06 Thread via GitHub
dharanad commented on code in PR #11273: URL: https://github.com/apache/datafusion/pull/11273#discussion_r1667401277 ## datafusion/functions/src/core/planner.rs: ## @@ -38,3 +40,28 @@ impl UserDefinedSQLPlanner for CoreFunctionPlanner { Ok(PlannerResult::Planned(named_s

Re: [PR] Implement user defined planner for `create_struct` & `create_named_struct` [datafusion]

2024-07-06 Thread via GitHub
dharanad commented on code in PR #11273: URL: https://github.com/apache/datafusion/pull/11273#discussion_r1667401435 ## datafusion/sql/src/expr/mod.rs: ## @@ -629,6 +630,41 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { } } +/// Parses a struct(..) express

[I] implement SubqueryType::SetPredicate in Substrait [datafusion]

2024-07-06 Thread via GitHub
Lordworms opened a new issue, #11309: URL: https://github.com/apache/datafusion/issues/11309 ### Is your feature request related to a problem or challenge? also related to #10710 ### Describe the solution you'd like _No response_ ### Describe alternatives you've c

Re: [PR] feat: Use unified allocator for execution iterators [datafusion-comet]

2024-07-06 Thread via GitHub
viirya commented on code in PR #613: URL: https://github.com/apache/datafusion-comet/pull/613#discussion_r1667405223 ## spark/src/test/scala/org/apache/spark/sql/CometTPCDSQuerySuite.scala: ## @@ -158,6 +158,11 @@ class CometTPCDSQuerySuite conf.set(CometConf.COMET_EXEC_ALL

[PR] build(deps): bump arrow from 52.0.0 to 52.1.0 [datafusion-python]

2024-07-06 Thread via GitHub
dependabot[bot] opened a new pull request, #748: URL: https://github.com/apache/datafusion-python/pull/748 Bumps [arrow](https://github.com/apache/arrow-rs) from 52.0.0 to 52.1.0. Changelog Sourced from https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md";>arrow's chang

Re: [PR] initial prettier unparse [datafusion]

2024-07-06 Thread via GitHub
MohamedAbdeen21 commented on code in PR #11186: URL: https://github.com/apache/datafusion/pull/11186#discussion_r1667413972 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -314,3 +310,78 @@ fn test_table_references_in_plan_to_sql() { "SELECT \"table\".id, \"table\".

[PR] minor: Add `PhysicalSortExpr::new` [datafusion]

2024-07-06 Thread via GitHub
andygrove opened a new pull request, #11310: URL: https://github.com/apache/datafusion/pull/11310 ## Which issue does this PR close? N/A ## Rationale for this change I am working on some perf tests where I manually construct physical plans and noticed tha

Re: [I] Separate Spark-compatibility expressions into a library [datafusion-comet]

2024-07-06 Thread via GitHub
andygrove commented on issue #630: URL: https://github.com/apache/datafusion-comet/issues/630#issuecomment-2212034526 I created https://github.com/apache/datafusion-comet/pull/637 so that we can start splitting code out into separate crates. I also started looking at what would be in

Re: [I] Review the behavior of `count` with multiple arguments [datafusion]

2024-07-06 Thread via GitHub
jayzhan211 commented on issue #11303: URL: https://github.com/apache/datafusion/issues/11303#issuecomment-2212053813 > Related to this, we could maybe support zero-arg `count()`. > > For 2+ args for count, I as a user would prefer to used filtered aggregation. It's then obvious whethe

Re: [PR] Change array agg result from empty list to null if no row qualifed [datafusion]

2024-07-06 Thread via GitHub
jayzhan211 commented on code in PR #11299: URL: https://github.com/apache/datafusion/pull/11299#discussion_r1667496202 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -1753,31 +1753,12 @@ NULL 4 29 1.260869565217 123 -117 23 NULL 5 -194 -13.857142857143 118 -101 14

[PR] Implement TPCH substrait integration teset, support tpch_4 and tpch_5 [datafusion]

2024-07-06 Thread via GitHub
Lordworms opened a new pull request, #11311: URL: https://github.com/apache/datafusion/pull/11311 ## Which issue does this PR close? part of #10710 Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these

Re: [PR] Implement TPCH substrait integration teset, support tpch_4 and tpch_5 [datafusion]

2024-07-06 Thread via GitHub
Lordworms commented on PR #11311: URL: https://github.com/apache/datafusion/pull/11311#issuecomment-2212056499 not much to change for these two queries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Implement a scalar function for creating ScalarValue::Map [datafusion]

2024-07-06 Thread via GitHub
jayzhan211 commented on issue #11268: URL: https://github.com/apache/datafusion/issues/11268#issuecomment-2212091914 > > `make_map_batch` version is way faster, we should go with that one. > > Upd: I tried to remove clone in MapBuilder version, it still seems slower than manually construc

Re: [PR] Better Cast name for display [datafusion]

2024-07-06 Thread via GitHub
github-actions[bot] closed pull request #10276: Better Cast name for display URL: https://github.com/apache/datafusion/pull/10276 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] RFC: Make it easier to call window functions via expression API (and add example) [datafusion]

2024-07-06 Thread via GitHub
github-actions[bot] commented on PR #6746: URL: https://github.com/apache/datafusion/pull/6746#issuecomment-2212242630 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

[I] Enhance `named_struct` to allow column as key [datafusion]

2024-07-06 Thread via GitHub
jayzhan211 opened a new issue, #11312: URL: https://github.com/apache/datafusion/issues/11312 ### Is your feature request related to a problem or challenge? Failed test ``` statement ok create table t(name varchar, val int) as values ('a', 1), ('b', 2), ('c', 3); query

Re: [I] Implement a scalar function for creating ScalarValue::Map [datafusion]

2024-07-06 Thread via GitHub
goldmedal commented on issue #11268: URL: https://github.com/apache/datafusion/issues/11268#issuecomment-2212299543 > what is your code for benchmarking? > Here's my testing code. https://github.com/goldmedal/datafusion/blob/feature/11268-scalar-funciton-map-v2/datafusion/func

[I] Let `CASE` expression only accept boolean in `WHEN` branch [datafusion]

2024-07-06 Thread via GitHub
2010YOUY01 opened a new issue, #11313: URL: https://github.com/apache/datafusion/issues/11313 ### Is your feature request related to a problem or challenge? Now `CASE` expression's `WHEN` branch can accept numbers, maybe it's better to let it only accept boolean due to: 1. Avoid co

Re: [I] Enhance `named_struct` to support keys as multiple rows [datafusion]

2024-07-06 Thread via GitHub
jayzhan211 closed issue #11312: Enhance `named_struct` to support keys as multiple rows URL: https://github.com/apache/datafusion/issues/11312 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Enhance `named_struct` to support keys as multiple rows [datafusion]

2024-07-06 Thread via GitHub
jayzhan211 commented on issue #11312: URL: https://github.com/apache/datafusion/issues/11312#issuecomment-2212305093 It is not possible to get the exact array in `return_type_from_exprs` -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Convert `nth_value` to UDAF [datafusion]

2024-07-06 Thread via GitHub
jayzhan211 commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667600380 ## datafusion/functions-aggregate/src/nth_value.rs: ## @@ -19,152 +19,150 @@ //! that can evaluated at runtime during query execution use std::any::Any; -us

Re: [I] Implement a scalar function for creating ScalarValue::Map [datafusion]

2024-07-06 Thread via GitHub
jayzhan211 commented on issue #11268: URL: https://github.com/apache/datafusion/issues/11268#issuecomment-2212311916 You can also do the benchmarking in `datafusion/functions/benches/map.rs`. Add this to cargo.toml, and run with cargo bench --bench map ```rust [[bench]] harnes

Re: [PR] initial prettier unparse [datafusion]

2024-07-06 Thread via GitHub
phillipleblanc commented on code in PR #11186: URL: https://github.com/apache/datafusion/pull/11186#discussion_r1667601690 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -314,3 +310,78 @@ fn test_table_references_in_plan_to_sql() { "SELECT \"table\".id, \"table\".\

Re: [I] Let `CASE` expression only accept boolean in `WHEN` branch [datafusion]

2024-07-06 Thread via GitHub
jayzhan211 commented on issue #11313: URL: https://github.com/apache/datafusion/issues/11313#issuecomment-2212313458 We not only follow Postgres but also DuckDB or others. We have the same result as DuckDB in this case, so I think we don't need to convert it to error. -- This is an

Re: [I] Improve performance of DataPage statistics extraction using StringBuilder [datafusion]

2024-07-06 Thread via GitHub
Rachelint commented on issue #11281: URL: https://github.com/apache/datafusion/issues/11281#issuecomment-2212319819 Strange results got from my poc... After @efredine fixed the `filter_map` bug in #11295 , we can use `StringArray::from_iter` to relace `collect + StringArray::from`. A

[PR] Demonstrate unions can't be null [datafusion]

2024-07-06 Thread via GitHub
samuelcolvin opened a new pull request, #11314: URL: https://github.com/apache/datafusion/pull/11314 Demonstrates #11162 - union columns (and scalars) never match an `is null` check. This applies to both sparse and dense unions. Similarly union columns alwasy match a `is not nu

Re: [I] Union columns can never be `NULL` [datafusion]

2024-07-06 Thread via GitHub
samuelcolvin commented on issue #11162: URL: https://github.com/apache/datafusion/issues/11162#issuecomment-2212326030 See #11314 as a demonstration of the problem for both dense and sparse unions. After a bit of investigation, the issues lies in the first instance with https:

Re: [I] Union columns can never be `NULL` [datafusion]

2024-07-06 Thread via GitHub
jayzhan211 commented on issue #11162: URL: https://github.com/apache/datafusion/issues/11162#issuecomment-2212329965 > Have custom logic for unions that looks up the child array to determine if the value is null +1 for second option. I think we should check the children's nullability. -

Re: [PR] HashJoin can preserve the right ordering when join type is Right [datafusion]

2024-07-06 Thread via GitHub
viirya commented on code in PR #11276: URL: https://github.com/apache/datafusion/pull/11276#discussion_r1667614074 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -1107,6 +,8 @@ struct HashJoinStream { batch_size: usize, /// Scratch space for computing ha

Re: [PR] HashJoin can preserve the right ordering when join type is Right [datafusion]

2024-07-06 Thread via GitHub
viirya commented on code in PR #11276: URL: https://github.com/apache/datafusion/pull/11276#discussion_r1667615196 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1411,6 +1424,63 @@ where .collect::>() } +/// Appends probe indices in order by considering the g

Re: [PR] HashJoin can preserve the right ordering when join type is Right [datafusion]

2024-07-06 Thread via GitHub
viirya commented on code in PR #11276: URL: https://github.com/apache/datafusion/pull/11276#discussion_r1667615290 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1411,6 +1424,63 @@ where .collect::>() } +/// Appends probe indices in order by considering the g

Re: [PR] HashJoin can preserve the right ordering when join type is Right [datafusion]

2024-07-06 Thread via GitHub
viirya commented on code in PR #11276: URL: https://github.com/apache/datafusion/pull/11276#discussion_r1667615408 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1411,6 +1424,63 @@ where .collect::>() } +/// Appends probe indices in order by considering the g

Re: [PR] HashJoin can preserve the right ordering when join type is Right [datafusion]

2024-07-06 Thread via GitHub
viirya commented on code in PR #11276: URL: https://github.com/apache/datafusion/pull/11276#discussion_r1667615450 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1411,6 +1424,63 @@ where .collect::>() } +/// Appends probe indices in order by considering the g

Re: [PR] HashJoin can preserve the right ordering when join type is Right [datafusion]

2024-07-06 Thread via GitHub
viirya commented on code in PR #11276: URL: https://github.com/apache/datafusion/pull/11276#discussion_r1667615567 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1411,6 +1424,63 @@ where .collect::>() } +/// Appends probe indices in order by considering the g

  1   2   >