[PR] Minor: Remove clone in optimizer [datafusion]

2024-07-07 Thread via GitHub
jayzhan211 opened a new pull request, #11315: URL: https://github.com/apache/datafusion/pull/11315 ## Which issue does this PR close? Part of #4628 and #9637 Closes #. ## Rationale for this change There are still clones left to be removed ##

Re: [PR] Implement UDF Plan [datafusion]

2024-07-07 Thread via GitHub
xinlifoobar closed pull request #11263: Implement UDF Plan URL: https://github.com/apache/datafusion/pull/11263 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] Let `CASE` expression only accept boolean in `WHEN` branch [datafusion]

2024-07-07 Thread via GitHub
2010YOUY01 closed issue #11313: Let `CASE` expression only accept boolean in `WHEN` branch URL: https://github.com/apache/datafusion/issues/11313 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Let `CASE` expression only accept boolean in `WHEN` branch [datafusion]

2024-07-07 Thread via GitHub
2010YOUY01 commented on issue #11313: URL: https://github.com/apache/datafusion/issues/11313#issuecomment-2212372720 > We not only follow Postgres but also DuckDB or others. We usually follow the majority, otherwise we choose either of them (usually Postgres, but not restricted to) >

[I] Preserve LogicalPlan to avoid clone for analyzer [datafusion]

2024-07-07 Thread via GitHub
jayzhan211 opened a new issue, #11316: URL: https://github.com/apache/datafusion/issues/11316 ### Is your feature request related to a problem or challenge? Step forward to #4628 I found that we need to clone plan if analyzer has context error https://github.com/apache/

Re: [I] Run miri checks in CI [datafusion-comet]

2024-07-07 Thread via GitHub
vaibhawvipul commented on issue #634: URL: https://github.com/apache/datafusion-comet/issues/634#issuecomment-2212389298 @andygrove This is great! Thank you for implementing this. I have a question though, why did we decide to go with Miri? Why not say verus or kani? -- This is an autom

Re: [PR] AggregateExec: Take grouping sets into account for InputOrderMode [datafusion]

2024-07-07 Thread via GitHub
thinkharderdev merged PR #11301: URL: https://github.com/apache/datafusion/pull/11301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Incorrect results in aggregation queries with grouping sets [datafusion]

2024-07-07 Thread via GitHub
thinkharderdev closed issue #11291: Incorrect results in aggregation queries with grouping sets URL: https://github.com/apache/datafusion/issues/11291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Feat: Implement hf:// / "hugging face" integration in datafusion-cli [datafusion]

2024-07-07 Thread via GitHub
alamb commented on PR #10792: URL: https://github.com/apache/datafusion/pull/10792#issuecomment-2212414247 This is still on my list, but I am behind in my reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Data set which is much bigger than RAM [datafusion]

2024-07-07 Thread via GitHub
alamb commented on issue #10897: URL: https://github.com/apache/datafusion/issues/10897#issuecomment-2212415404 Thanks @korowa -- this analysis makes sense (aka that there is some constant overhead per active partition) @Smotrov does this match your dataset? As in how many partitions

Re: [PR] Implement `DynamicFileSchemaProvider` in the core [datafusion]

2024-07-07 Thread via GitHub
alamb commented on PR #11035: URL: https://github.com/apache/datafusion/pull/11035#issuecomment-2212415792 I apologize for not finding time yet to re-review this PR. It contains a substantial seeming API change and I need to find enough contiguous review time to review it carefully. It is o

Re: [PR] RFC: Make it easier to call window functions via expression API (and add example) [datafusion]

2024-07-07 Thread via GitHub
alamb closed pull request #6746: RFC: Make it easier to call window functions via expression API (and add example) URL: https://github.com/apache/datafusion/pull/6746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] RFC: Make it easier to call window functions via expression API (and add example) [datafusion]

2024-07-07 Thread via GitHub
alamb commented on PR #6746: URL: https://github.com/apache/datafusion/pull/6746#issuecomment-2212416613 We can revive this PR / its API when someone has time to work on it In case anyone is following along, @jayzhan211 added a really nice trait for working with aggregate functions. M

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-07-07 Thread via GitHub
alamb commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2212416660 In case anyone is following along, @jayzhan211 added a really nice trait for working with aggregate functions. Maybe we can do something similar for window functions eventually

Re: [PR] Remove unnecessary qualified names [datafusion]

2024-07-07 Thread via GitHub
alamb commented on PR #11292: URL: https://github.com/apache/datafusion/pull/11292#issuecomment-2212419096 > Thanks for review and merge, @alamb ! RustRover has nice inspections and can detect unnecessary use of qualified names. Sadly it can't fix them automatically (yet), so it's currently

Re: [I] DataFusion weekly project plan (Andrew Lamb) - July 1, 2024 [datafusion]

2024-07-07 Thread via GitHub
alamb commented on issue #11190: URL: https://github.com/apache/datafusion/issues/11190#issuecomment-2212428384 Review Queue Arrow - [ ] https://github.com/apache/arrow-rs/pull/5486 DataFusion - [ ] https://github.com/apache/datafusion/pull/11035 - [ ] https://github.co

Re: [I] Review the behavior of `count` with multiple arguments [datafusion]

2024-07-07 Thread via GitHub
jonahgao commented on issue #11303: URL: https://github.com/apache/datafusion/issues/11303#issuecomment-2212431032 This feature was introduced by #5908. Spark also supports it, and its behavior seems to be consistent with MySQL. So I think we can follow [Spark](https://spark.apache.org/docs

[PR] Made UserDefinedFunctionPlanner to uniform the usages [datafusion]

2024-07-07 Thread via GitHub
xinlifoobar opened a new pull request, #11318: URL: https://github.com/apache/datafusion/pull/11318 ## Which issue does this PR close? Closes #11305 ## Rationale for this change I moved part of the code from #11263 into this PR to reduce the # of planner

[PR] Use Builder to improve stats convert performance [datafusion]

2024-07-07 Thread via GitHub
Rachelint opened a new pull request, #11319: URL: https://github.com/apache/datafusion/pull/11319 ## Which issue does this PR close? Closes #11281 ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [I] feat: Support `unnest` cross join with table in `FROM` clause [datafusion]

2024-07-07 Thread via GitHub
jonahgao commented on issue #9394: URL: https://github.com/apache/datafusion/issues/9394#issuecomment-2212453652 > I took a look at Postgres cross join behavior, it looks like the order of the result is not deterministic either (meaning it feel free to decide which table is small and should

Re: [I] feat: Support `unnest` cross join with table in `FROM` clause [datafusion]

2024-07-07 Thread via GitHub
jonahgao commented on issue #9394: URL: https://github.com/apache/datafusion/issues/9394#issuecomment-2212454419 Let's close this issue for now; we can reopen it if there are any other problems. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [I] feat: Support `unnest` cross join with table in `FROM` clause [datafusion]

2024-07-07 Thread via GitHub
jonahgao closed issue #9394: feat: Support `unnest` cross join with table in `FROM` clause URL: https://github.com/apache/datafusion/issues/9394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Implement `DynamicFileSchemaProvider` in the core [datafusion]

2024-07-07 Thread via GitHub
goldmedal commented on PR #11035: URL: https://github.com/apache/datafusion/pull/11035#issuecomment-2212457826 > I apologize for not finding time yet to re-review this PR. It contains a substantial seeming API change and I need to find enough contiguous review time to review it carefully. I

Re: [PR] HashJoin can preserve the right ordering when join type is Right [datafusion]

2024-07-07 Thread via GitHub
korowa commented on code in PR #11276: URL: https://github.com/apache/datafusion/pull/11276#discussion_r1667701640 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -3813,3 +3813,58 @@ logical_plan 01)SubqueryAlias: b 02)--Projection: Int64(1) AS a 03)EmptyRelation +

Re: [PR] Move configuration information out of example usage page [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11300: URL: https://github.com/apache/datafusion/pull/11300#discussion_r1667348709 ## docs/source/user-guide/example-usage.md: ## @@ -33,29 +33,6 @@ datafusion = "latest_version" tokio = { version = "1.0", features = ["rt-multi-thread"] } ``` -

Re: [I] Improve performance of DataPage statistics extraction using StringBuilder [datafusion]

2024-07-07 Thread via GitHub
efredine commented on issue #11281: URL: https://github.com/apache/datafusion/issues/11281#issuecomment-2212470949 Yes - the primary (initial) problem is that the collection needs to be built so that it owns the references to the items but we want to do that without creating any intermediat

Re: [PR] feat: support `COUNT()` [datafusion]

2024-07-07 Thread via GitHub
tshauck commented on code in PR #11229: URL: https://github.com/apache/datafusion/pull/11229#discussion_r1667719665 ## datafusion/functions-aggregate/src/aggregate_function_planner.rs: ## @@ -0,0 +1,80 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[PR] feat: Create new `datafusion-comet-expr` crate containing Spark-compatible DataFusion expressions [datafusion-comet]

2024-07-07 Thread via GitHub
andygrove opened a new pull request, #638: URL: https://github.com/apache/datafusion-comet/pull/638 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] initial prettier unparse [datafusion]

2024-07-07 Thread via GitHub
MohamedAbdeen21 commented on code in PR #11186: URL: https://github.com/apache/datafusion/pull/11186#discussion_r1667722385 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -314,3 +310,78 @@ fn test_table_references_in_plan_to_sql() { "SELECT \"table\".id, \"table\".

Re: [I] Improve performance of DataPage statistics extraction using StringBuilder [datafusion]

2024-07-07 Thread via GitHub
Rachelint commented on issue #11281: URL: https://github.com/apache/datafusion/issues/11281#issuecomment-2212499867 > Yes - the primary (initial) problem is that the collection needs to be built so that it owns the references to the items but we want to do that without creating any intermed

Re: [PR] HashJoin can preserve the right ordering when join type is Right [datafusion]

2024-07-07 Thread via GitHub
ozankabak commented on code in PR #11276: URL: https://github.com/apache/datafusion/pull/11276#discussion_r1667728894 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1411,6 +1424,63 @@ where .collect::>() } +/// Appends probe indices in order by considering th

Re: [PR] Introduce user defined SQL planner API [datafusion]

2024-07-07 Thread via GitHub
rtyler commented on code in PR #11180: URL: https://github.com/apache/datafusion/pull/11180#discussion_r1667729599 ## datafusion/sql/src/expr/mod.rs: ## @@ -341,7 +278,17 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { } }; -

Re: [PR] Add user_defined_sql_planners(..) to FunctionRegistry [datafusion]

2024-07-07 Thread via GitHub
alamb merged PR #11296: URL: https://github.com/apache/datafusion/pull/11296 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Introduce user defined SQL planner API [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11180: URL: https://github.com/apache/datafusion/pull/11180#discussion_r1667731528 ## datafusion/sql/src/expr/mod.rs: ## @@ -341,7 +278,17 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { } }; -

[I] Extract registering default features from `SessionState` and into its own function [datafusion]

2024-07-07 Thread via GitHub
alamb opened a new issue, #11320: URL: https://github.com/apache/datafusion/issues/11320 ### Is your feature request related to a problem or challenge? Using `SessionContext` provides a batteries included experience, as it configures and installs many functions, rewrites, data provide

Re: [I] Break datafusion-catalog code into its own crate [datafusion]

2024-07-07 Thread via GitHub
alamb commented on issue #11182: URL: https://github.com/apache/datafusion/issues/11182#issuecomment-2212515109 > The more I think about this the more I think trying to make `SessionState` a container that doesn't have all the optional features (like parquet support) by default makes sense

Re: [PR] Fix data page statistics when all rows are null in a data page [datafusion]

2024-07-07 Thread via GitHub
Rachelint commented on code in PR #11295: URL: https://github.com/apache/datafusion/pull/11295#discussion_r1667735581 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -823,11 +819,11 @@ macro_rules! get_data_page_statistics { Floa

Re: [PR] allow alias in predicate [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11307: URL: https://github.com/apache/datafusion/pull/11307#discussion_r1667733900 ## datafusion/core/tests/user_defined/user_defined_sql_planner.rs: ## @@ -86,3 +97,27 @@ async fn test_custom_operators_long_arrow() { ]; assert_batches_eq

Re: [PR] use safe cast in propagate_constraints [datafusion]

2024-07-07 Thread via GitHub
alamb merged PR #11297: URL: https://github.com/apache/datafusion/pull/11297 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Error evaluating clause `where COL_BIGINT < 1e100` (Found by SQLancer-NoREC) [datafusion]

2024-07-07 Thread via GitHub
alamb closed issue #11252: Error evaluating clause `where COL_BIGINT < 1e100` (Found by SQLancer-NoREC) URL: https://github.com/apache/datafusion/issues/11252 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] use safe cast in propagate_constraints [datafusion]

2024-07-07 Thread via GitHub
alamb commented on PR #11297: URL: https://github.com/apache/datafusion/pull/11297#issuecomment-2212519239 Thanks @Lordworms šŸ™ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Implement user defined planner for extract [datafusion]

2024-07-07 Thread via GitHub
alamb commented on PR #11215: URL: https://github.com/apache/datafusion/pull/11215#issuecomment-2212519668 > The issue boils down to SqlToRel::new_with_options(..) not setting the planners. Unfortunately, I can't find a way to retrieve the list of planners set on the context to set them mys

Re: [PR] HashJoin can preserve the right ordering when join type is Right [datafusion]

2024-07-07 Thread via GitHub
ozankabak commented on code in PR #11276: URL: https://github.com/apache/datafusion/pull/11276#discussion_r1667736330 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1332,24 +1339,30 @@ pub(crate) fn adjust_indices_by_join_type( pub(crate) fn append_right_indices( Revie

[PR] Support `IS NULL` and `IS NOT NULL` on Unions [datafusion]

2024-07-07 Thread via GitHub
samuelcolvin opened a new pull request, #11321: URL: https://github.com/apache/datafusion/pull/11321 ## Which issue does this PR close? Closes #11162, replaces #11314 ## Rationale for this change See #11162. ## What changes are included in this PR? * Changes

Re: [I] Union columns can never be `NULL` [datafusion]

2024-07-07 Thread via GitHub
samuelcolvin commented on issue #11162: URL: https://github.com/apache/datafusion/issues/11162#issuecomment-2212529260 I've proposed a fix in #11321. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Demonstrate unions can't be null [datafusion]

2024-07-07 Thread via GitHub
samuelcolvin closed pull request #11314: Demonstrate unions can't be null URL: https://github.com/apache/datafusion/pull/11314 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Demonstrate unions can't be null [datafusion]

2024-07-07 Thread via GitHub
samuelcolvin commented on PR #11314: URL: https://github.com/apache/datafusion/pull/11314#issuecomment-2212529602 replaced by #11321. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] initial prettier unparse [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11186: URL: https://github.com/apache/datafusion/pull/11186#discussion_r1667742546 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -314,3 +310,78 @@ fn test_table_references_in_plan_to_sql() { "SELECT \"table\".id, \"table\".\"value\"

Re: [PR] Support `IS NULL` and `IS NOT NULL` on Unions [datafusion]

2024-07-07 Thread via GitHub
samuelcolvin commented on code in PR #11321: URL: https://github.com/apache/datafusion/pull/11321#discussion_r1667743748 ## datafusion/physical-expr/src/expressions/is_null.rs: ## @@ -100,6 +110,49 @@ impl PhysicalExpr for IsNullExpr { } } +pub(crate) fn union_is_null(un

Re: [PR] Fix data page statistics when all rows are null in a data page [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11295: URL: https://github.com/apache/datafusion/pull/11295#discussion_r1667745320 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -919,16 +915,28 @@ macro_rules! get_data_page_statistics { })

Re: [I] Improve performance of DataPage statistics extraction using StringBuilder [datafusion]

2024-07-07 Thread via GitHub
alamb commented on issue #11281: URL: https://github.com/apache/datafusion/issues/11281#issuecomment-2212541365 > But still somethings confused me now... When using the builder directly, looping in flatten way is obviously slower than looping it in nested way... Maybe the compiler is

Re: [PR] Impl a general get results from stats [datafusion]

2024-07-07 Thread via GitHub
alamb commented on PR #11261: URL: https://github.com/apache/datafusion/pull/11261#issuecomment-2212541638 > @alamb One thing I worry about the narrow api is that, it seems can't be used to support the original optimization of min/max? > > Maybe I misunderstand about it? No, so

Re: [I] Union columns can never be `NULL` [datafusion]

2024-07-07 Thread via GitHub
alamb commented on issue #11162: URL: https://github.com/apache/datafusion/issues/11162#issuecomment-2212542658 > I suppose there's a third option of updating arrow-rs to correctly calculate if a `UnionArray` is null, but I presume that works take much longer It would likely t

Re: [PR] Support `IS NULL` and `IS NOT NULL` on Unions [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11321: URL: https://github.com/apache/datafusion/pull/11321#discussion_r1667750918 ## datafusion/physical-expr/src/expressions/is_null.rs: ## @@ -100,6 +110,49 @@ impl PhysicalExpr for IsNullExpr { } } +pub(crate) fn union_is_null(union_arr

Re: [PR] Convert `nth_value` to UDAF [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667753674 ## datafusion/expr/src/aggregate_function.rs: ## @@ -39,8 +39,6 @@ pub enum AggregateFunction { Max, Review Comment: this list is quite close to empty šŸ¤ž

Re: [PR] Minor: Remove clone in optimizer [datafusion]

2024-07-07 Thread via GitHub
alamb merged PR #11315: URL: https://github.com/apache/datafusion/pull/11315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: Remove clone in optimizer [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11315: URL: https://github.com/apache/datafusion/pull/11315#discussion_r1667753904 ## datafusion/optimizer/src/eliminate_outer_join.rs: ## @@ -109,9 +110,10 @@ impl OptimizerRule for EliminateOuterJoin { } else {

[PR] fix: Remove original plan parameter from CometNativeExec [datafusion-comet]

2024-07-07 Thread via GitHub
viirya opened a new pull request, #639: URL: https://github.com/apache/datafusion-comet/pull/639 ## Which issue does this PR close? Closes #594. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] Implement TPCH substrait integration test, support tpch_4 and tpch_5 [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11311: URL: https://github.com/apache/datafusion/pull/11311#discussion_r1667754415 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1297,6 +1297,32 @@ pub async fn from_substrait_rex( outer_ref_columns,

Re: [PR] minor: Add `PhysicalSortExpr::new` [datafusion]

2024-07-07 Thread via GitHub
alamb merged PR #11310: URL: https://github.com/apache/datafusion/pull/11310 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Use unified allocator for execution iterators [datafusion-comet]

2024-07-07 Thread via GitHub
viirya commented on PR #613: URL: https://github.com/apache/datafusion-comet/pull/613#issuecomment-2212554468 The OOM issue of some TPCDS queries in CI will be fixed by #639 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[I] `SanityCheckPlan` failed on `sql_planning`1 benchmark: , Plan("Child: [\"ProjectionExec: expr=[]\", \" CoalesceBatchesExec: target_batch_size=8192\", [datafusion]

2024-07-07 Thread via GitHub
alamb opened a new issue, #11322: URL: https://github.com/apache/datafusion/issues/11322 ### Describe the bug The `SanityCheckPlan` pass (added in https://github.com/apache/datafusion/pull/11196) is now failing when I run `cargo bench --bench sql_planner` to test planning speed

Re: [PR] Add Optimizer Sanity Checker, improve sortedness equivalence properties [datafusion]

2024-07-07 Thread via GitHub
alamb commented on PR #11196: URL: https://github.com/apache/datafusion/pull/11196#issuecomment-2212555039 FYI this check is now failing on one of the sql benchmarks: https://github.com/apache/datafusion/issues/11322 -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Improve stats convert performance [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11319: URL: https://github.com/apache/datafusion/pull/11319#discussion_r1667756939 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -875,14 +914,14 @@ macro_rules! get_data_page_statistics { Some(DataTyp

Re: [PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11290: URL: https://github.com/apache/datafusion/pull/11290#discussion_r1667759675 ## docs/source/library-user-guide/using-the-dataframe-api.md: ## @@ -19,129 +19,236 @@ # Using the DataFrame API -## What is a DataFrame +## What is a DataFrame

Re: [PR] Support `IS NULL` and `IS NOT NULL` on Unions [datafusion]

2024-07-07 Thread via GitHub
samuelcolvin commented on PR #11321: URL: https://github.com/apache/datafusion/pull/11321#issuecomment-2212567632 @alamb, I agree on your comments, I'll get those things fixed tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] `SanityCheckPlan` failed on `sql_planning`1 benchmark: , Plan("Child: [\"ProjectionExec: expr=[]\", \" CoalesceBatchesExec: target_batch_size=8192\", [datafusion]

2024-07-07 Thread via GitHub
ozankabak commented on issue #11322: URL: https://github.com/apache/datafusion/issues/11322#issuecomment-2212573441 We will check and fix tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Implement initial version of to_json [datafusion-comet]

2024-07-07 Thread via GitHub
jatin510 commented on issue #631: URL: https://github.com/apache/datafusion-comet/issues/631#issuecomment-2212574753 Hello @andygrove I would like to work on this issue. Can you please assign it to me -- This is an automated message from the Apache Git Service. To respond to

[I] Support Qualified Wildcard in Count [datafusion]

2024-07-07 Thread via GitHub
tshauck opened a new issue, #11323: URL: https://github.com/apache/datafusion/issues/11323 ### Is your feature request related to a problem or challenge? In working on https://github.com/apache/datafusion/pull/11229 I noticed that `SELECT COUNT(t1.*) FROM t1` doesn't work and throws a

Re: [PR] Implement user defined planner for `create_struct` & `create_named_struct` [datafusion]

2024-07-07 Thread via GitHub
dharanad commented on code in PR #11273: URL: https://github.com/apache/datafusion/pull/11273#discussion_r1667764545 ## datafusion/sql/src/expr/mod.rs: ## @@ -629,6 +630,36 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { } } +/// Parses a struct(..) express

Re: [PR] Implement user defined planner for `create_struct` & `create_named_struct` [datafusion]

2024-07-07 Thread via GitHub
dharanad commented on code in PR #11273: URL: https://github.com/apache/datafusion/pull/11273#discussion_r1667401277 ## datafusion/functions/src/core/planner.rs: ## @@ -38,3 +40,28 @@ impl UserDefinedSQLPlanner for CoreFunctionPlanner { Ok(PlannerResult::Planned(named_s

Re: [PR] Implement user defined planner for `create_struct` & `create_named_struct` [datafusion]

2024-07-07 Thread via GitHub
dharanad commented on code in PR #11273: URL: https://github.com/apache/datafusion/pull/11273#discussion_r1667764545 ## datafusion/sql/src/expr/mod.rs: ## @@ -629,6 +630,36 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { } } +/// Parses a struct(..) express

Re: [PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11290: URL: https://github.com/apache/datafusion/pull/11290#discussion_r1667765044 ## docs/source/library-user-guide/using-the-dataframe-api.md: ## @@ -19,129 +19,236 @@ # Using the DataFrame API -## What is a DataFrame +## What is a DataFrame

Re: [PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11290: URL: https://github.com/apache/datafusion/pull/11290#discussion_r1667765100 ## docs/source/library-user-guide/using-the-dataframe-api.md: ## @@ -19,129 +19,236 @@ # Using the DataFrame API -## What is a DataFrame +## What is a DataFrame

Re: [PR] Fix data page statistics when all rows are null in a data page [datafusion]

2024-07-07 Thread via GitHub
efredine commented on code in PR #11295: URL: https://github.com/apache/datafusion/pull/11295#discussion_r1667766119 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -823,11 +819,11 @@ macro_rules! get_data_page_statistics { Float

Re: [PR] Improve stats convert performance [datafusion]

2024-07-07 Thread via GitHub
efredine commented on code in PR #11319: URL: https://github.com/apache/datafusion/pull/11319#discussion_r1667766837 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -747,10 +770,10 @@ macro_rules! get_data_page_statistics { Some(Data

[PR] Improve `DataFrame` Users Guide [datafusion]

2024-07-07 Thread via GitHub
alamb opened a new pull request, #11324: URL: https://github.com/apache/datafusion/pull/11324 ## Which issue does this PR close? Part of #3058 ## Rationale for this change While responding to comments from @efredine on https://github.com/apache/datafusion/pull/11290, I

Re: [PR] feat: support `COUNT()` [datafusion]

2024-07-07 Thread via GitHub
tshauck commented on PR #11229: URL: https://github.com/apache/datafusion/pull/11229#issuecomment-2212583535 Hi @jayzhan211, I took another look at this and am have some uncertainty I’d like to get feedback on. From what I can tell, it seems like I need to keep in the wildcard rule i

Re: [PR] Improve `DataFrame` Users Guide [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11324: URL: https://github.com/apache/datafusion/pull/11324#discussion_r1667768597 ## datafusion/core/src/lib.rs: ## @@ -626,6 +626,12 @@ doc_comment::doctest!( user_guide_configs ); +#[cfg(doctest)] +doc_comment::doctest!( Review Comment:

Re: [PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11290: URL: https://github.com/apache/datafusion/pull/11290#discussion_r1667769389 ## docs/source/library-user-guide/using-the-dataframe-api.md: ## @@ -19,129 +19,236 @@ # Using the DataFrame API -## What is a DataFrame +## What is a DataFrame

Re: [PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-07 Thread via GitHub
efredine commented on code in PR #11290: URL: https://github.com/apache/datafusion/pull/11290#discussion_r1667767853 ## docs/source/library-user-guide/using-the-dataframe-api.md: ## @@ -19,129 +19,268 @@ # Using the DataFrame API -## What is a DataFrame +The [Users Guide] i

Re: [PR] Improve stats convert performance [datafusion]

2024-07-07 Thread via GitHub
alamb commented on PR #11319: URL: https://github.com/apache/datafusion/pull/11319#issuecomment-2212584926 > I suspect the remaining cases where we are using collect could be made more efficient using the Builder pattern? I think the reason the Builder is faster for Strings / Binary i

Re: [PR] Implement user defined planner for `create_struct` & `create_named_struct` [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11273: URL: https://github.com/apache/datafusion/pull/11273#discussion_r1667769874 ## datafusion/sql/src/expr/mod.rs: ## @@ -629,6 +630,36 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { } } +/// Parses a struct(..) expression

Re: [PR] Fix data page statistics when all rows are null in a data page [datafusion]

2024-07-07 Thread via GitHub
alamb commented on PR #11295: URL: https://github.com/apache/datafusion/pull/11295#issuecomment-2212585411 Let's merge this one in so we can proceed with getting https://github.com/apache/datafusion/pull/11319 ready -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Fix data page statistics when all rows are null in a data page [datafusion]

2024-07-07 Thread via GitHub
alamb commented on PR #11295: URL: https://github.com/apache/datafusion/pull/11295#issuecomment-2212585423 THanks again@ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Fix data page statistics when all rows are null in a data page [datafusion]

2024-07-07 Thread via GitHub
alamb merged PR #11295: URL: https://github.com/apache/datafusion/pull/11295 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Incorrect statistics extracted from parquet data pages when all values are null [datafusion]

2024-07-07 Thread via GitHub
alamb closed issue #11280: Incorrect statistics extracted from parquet data pages when all values are null URL: https://github.com/apache/datafusion/issues/11280 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-07 Thread via GitHub
alamb commented on code in PR #11290: URL: https://github.com/apache/datafusion/pull/11290#discussion_r1667770413 ## docs/source/library-user-guide/using-the-dataframe-api.md: ## @@ -19,129 +19,268 @@ # Using the DataFrame API -## What is a DataFrame +The [Users Guide] intr

Re: [I] Implement initial version of to_json [datafusion-comet]

2024-07-07 Thread via GitHub
viirya commented on issue #631: URL: https://github.com/apache/datafusion-comet/issues/631#issuecomment-2212586416 Thanks @jatin510 . Assigned to you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-07 Thread via GitHub
alamb commented on PR #11290: URL: https://github.com/apache/datafusion/pull/11290#issuecomment-2212586526 > And in the near future we'll be able to turn it back into SQL which probably wouldn't belong here but is cool all the same ;-). I actually think we can do it now:

Re: [PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-07 Thread via GitHub
alamb commented on PR #11290: URL: https://github.com/apache/datafusion/pull/11290#issuecomment-2212586728 > Feel free to tag me on these example changes. I share you view that reviewing and refining documentation and examples is high impact and it's a great way for me to continue learning

Re: [PR] Implement user defined planner for `create_struct` & `create_named_struct` [datafusion]

2024-07-07 Thread via GitHub
dharanad commented on code in PR #11273: URL: https://github.com/apache/datafusion/pull/11273#discussion_r1667772206 ## datafusion/sql/src/expr/mod.rs: ## @@ -629,6 +630,36 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { } } +/// Parses a struct(..) express

Re: [I] bug: upper and lower not compatible with Spark for international character sets [datafusion-comet]

2024-07-07 Thread via GitHub
Lordworms commented on issue #483: URL: https://github.com/apache/datafusion-comet/issues/483#issuecomment-2212601870 Hi @andygrove Could you please provide the actual setup for this query? since I tried it locally to compare the differences between comet and raw spark, it appears to be t

Re: [PR] Implement TPCH substrait integration test, support tpch_4 and tpch_5 [datafusion]

2024-07-07 Thread via GitHub
Lordworms commented on code in PR #11311: URL: https://github.com/apache/datafusion/pull/11311#discussion_r1667780236 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1297,6 +1297,32 @@ pub async fn from_substrait_rex( outer_ref_columns,

Re: [PR] Implement prettier SQL unparsing (more human readable) [datafusion]

2024-07-07 Thread via GitHub
phillipleblanc commented on code in PR #11186: URL: https://github.com/apache/datafusion/pull/11186#discussion_r1667795643 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -314,3 +310,78 @@ fn test_table_references_in_plan_to_sql() { "SELECT \"table\".id, \"table\".\

Re: [I] Create a logo for the Comet project [datafusion-comet]

2024-07-07 Thread via GitHub
andygrove commented on issue #596: URL: https://github.com/apache/datafusion-comet/issues/596#issuecomment-2212661270 Thanks for all the submissions so far! It is hard to tell which of these are AI generated or not, and I think it will be important to have an SVG version of the logo, which

Re: [PR] Improve stats convert performance for Binary/String arrays [datafusion]

2024-07-07 Thread via GitHub
Rachelint commented on code in PR #11319: URL: https://github.com/apache/datafusion/pull/11319#discussion_r1667817843 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -392,51 +393,73 @@ macro_rules! get_statistics { }) },

Re: [PR] Improve stats convert performance for Binary/String arrays [datafusion]

2024-07-07 Thread via GitHub
Rachelint commented on code in PR #11319: URL: https://github.com/apache/datafusion/pull/11319#discussion_r1667820117 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -747,10 +770,10 @@ macro_rules! get_data_page_statistics { Some(Dat

Re: [PR] Improve stats convert performance for Binary/String arrays [datafusion]

2024-07-07 Thread via GitHub
Rachelint commented on code in PR #11319: URL: https://github.com/apache/datafusion/pull/11319#discussion_r1667820117 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -747,10 +770,10 @@ macro_rules! get_data_page_statistics { Some(Dat

Re: [PR] Improve stats convert performance for Binary/String arrays [datafusion]

2024-07-07 Thread via GitHub
Rachelint commented on PR #11319: URL: https://github.com/apache/datafusion/pull/11319#issuecomment-2212686882 > > I suspect the remaining cases where we are using collect could be made more efficient using the Builder pattern? > > I think the reason the Builder is faster for Strings

  1   2   >