[PR] Call function for indexing in parser [datafusion]

2024-05-04 Thread via GitHub
jayzhan211 opened a new pull request, #10375: URL: https://github.com/apache/datafusion/pull/10375 ## Which issue does this PR close? Part of #10374 ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] Call function for indexing in parser [datafusion]

2024-05-04 Thread via GitHub
jayzhan211 commented on code in PR #10375: URL: https://github.com/apache/datafusion/pull/10375#discussion_r1589915206 ## datafusion/sqllogictest/test_files/expr.slt: ## @@ -2325,28 +2325,142 @@ host3 3.3 # can have an aggregate function with an inner CASE WHEN query TR -sel

Re: [I] chore: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-04 Thread via GitHub
prashantksharma commented on issue #361: URL: https://github.com/apache/datafusion-comet/issues/361#issuecomment-2094060848 @andygrove , cc: @viirya ## Minor Query before opening PR Summary: - Have made changes to `expr.proto` and `QueryPlanSerde.scala`. The changes have

[I] Support user defined display for UDF [datafusion]

2024-05-04 Thread via GitHub
jayzhan211 opened a new issue, #10376: URL: https://github.com/apache/datafusion/issues/10376 ### Is your feature request related to a problem or challenge? The feature request is based on the need that I would like `get_field(expr, key)` to displayed as `expr[key]`. two reason

Re: [PR] Support OrderByExpr in Unparsed [datafusion]

2024-05-04 Thread via GitHub
alamb commented on code in PR #10370: URL: https://github.com/apache/datafusion/pull/10370#discussion_r1589965987 ## datafusion/sql/src/unparser/expr.rs: ## @@ -49,6 +81,15 @@ pub fn expr_to_sql(expr: &Expr) -> Result { unparser.expr_to_sql(expr) } +/// Convert a DataFus

Re: [I] Support OrderBy and Sort in Expr->String [datafusion]

2024-05-04 Thread via GitHub
alamb closed issue #10256: Support OrderBy and Sort in Expr->String URL: https://github.com/apache/datafusion/issues/10256 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Unparser: Support `ORDER BY` in window function definition [datafusion]

2024-05-04 Thread via GitHub
alamb merged PR #10370: URL: https://github.com/apache/datafusion/pull/10370 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Enable `split_file_groups_by_statistics` by default [datafusion]

2024-05-04 Thread via GitHub
alamb commented on issue #10336: URL: https://github.com/apache/datafusion/issues/10336#issuecomment-2094127979 THank you @yyy1000 🙏 I think a good place to start would be to write some sqllogic level tests to cover the important cases Perhaos for the first test: 1. Create

Re: [PR] Fix `coalesce`, `struct` and `named_strct` expr_fn function to take multiple arguments [datafusion]

2024-05-04 Thread via GitHub
alamb commented on code in PR #10321: URL: https://github.com/apache/datafusion/pull/10321#discussion_r1589968630 ## datafusion/functions/src/core/mod.rs: ## @@ -39,14 +42,68 @@ make_udf_function!(getfield::GetFieldFunc, GET_FIELD, get_field); make_udf_function!(coalesce::Coal

Re: [I] DataFusion weekly project plan (Andrew Lamb) - April 29, 2024 [datafusion]

2024-05-04 Thread via GitHub
alamb commented on issue #10283: URL: https://github.com/apache/datafusion/issues/10283#issuecomment-2094129208 Sorry @liukun -- my bad -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Comet sort order different to Spark for 0.0 and -0.0 [datafusion-comet]

2024-05-04 Thread via GitHub
andygrove commented on issue #353: URL: https://github.com/apache/datafusion-comet/issues/353#issuecomment-2094147096 It is probably worth having a section in the compatibility guide specifically for Rust vs Java differences like this. -- This is an automated message from the Apache Git

Re: [I] [EPIC] (Even More) Grouping / Group By / Aggregation Performance [datafusion]

2024-05-04 Thread via GitHub
karlovnv commented on issue #7000: URL: https://github.com/apache/datafusion/issues/7000#issuecomment-2094202184 Hi! There is great job done here! I faced with an issues with CoalesceBatches: it seams that there is a performance killer somewhere in CoalesceBatchesStream. It's spe

Re: [I] [EPIC] (Even More) Grouping / Group By / Aggregation Performance [datafusion]

2024-05-04 Thread via GitHub
karlovnv commented on issue #7000: URL: https://github.com/apache/datafusion/issues/7000#issuecomment-2094206289 Another topic related issue is performance of **RowConverter** used for grouping. More than 75% of GroupedHashAggregateStream work is converting composite aggregation key

[I] Remove DataPtr trait and use Arc::ptr_eq directly [datafusion]

2024-05-04 Thread via GitHub
intoraw opened a new issue, #10377: URL: https://github.com/apache/datafusion/issues/10377 ### Is your feature request related to a problem or challenge? https://github.com/rust-lang/rust/pull/106450 Arc::ptr_eq compares the underlying pointer without metadata, it should be safe t

[PR] chore: remove DataPtr trait since Arc::ptr_eq ignores pointer metadata [datafusion]

2024-05-04 Thread via GitHub
intoraw opened a new pull request, #10378: URL: https://github.com/apache/datafusion/pull/10378 ## Which issue does this PR close? Closes #10377. ## Rationale for this change According to https://github.com/rust-lang/rust/pull/106450, Arc::ptr_eq now compares two poi

Re: [I] Implement Spark-compatible cast to/from binary type [datafusion-comet]

2024-05-04 Thread via GitHub
andygrove commented on issue #377: URL: https://github.com/apache/datafusion-comet/issues/377#issuecomment-2094222085 Thanks @mattharder91. Feel free to break this down into smaller issues if needed e.g. `string <-> binary`, `integers <-> binary` and so on. There is a little more inf

Re: [PR] docs: Add DataFusion subprojects to navigation menu, other minor updates [datafusion]

2024-05-04 Thread via GitHub
comphead merged PR #10362: URL: https://github.com/apache/datafusion/pull/10362 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: Improve cast compatibility tests and docs [datafusion-comet]

2024-05-04 Thread via GitHub
andygrove commented on code in PR #379: URL: https://github.com/apache/datafusion-comet/pull/379#discussion_r1590008164 ## spark/src/main/scala/org/apache/comet/GenerateDocs.scala: ## @@ -64,23 +64,36 @@ object GenerateDocs { val outputFilename = "docs/source/user-guide/com

Re: [PR] feat: Improve cast compatibility tests and docs [datafusion-comet]

2024-05-04 Thread via GitHub
andygrove commented on code in PR #379: URL: https://github.com/apache/datafusion-comet/pull/379#discussion_r1590008334 ## spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala: ## @@ -253,7 +253,8 @@ class CometExecSuite extends CometTestBase { dataTypes.map { su

Re: [I] Write a guide on contributing a new expression [datafusion-comet]

2024-05-04 Thread via GitHub
andygrove commented on issue #370: URL: https://github.com/apache/datafusion-comet/issues/370#issuecomment-2094296990 And I suppose we also need a separate guide on adding cast expressions since there are specific test considerations -- This is an automated message from the Apache Git Se

[PR] Add sqllogictest and enable split_file_groups_by_statistics [datafusion]

2024-05-04 Thread via GitHub
yyy1000 opened a new pull request, #10381: URL: https://github.com/apache/datafusion/pull/10381 ## Which issue does this PR close? Closes #10336 . ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [PR] Add sqllogictest and enable split_file_groups_by_statistics [datafusion]

2024-05-04 Thread via GitHub
yyy1000 commented on PR #10381: URL: https://github.com/apache/datafusion/pull/10381#issuecomment-2094302886 I think it's better to push this though not finished to get early comments to better modify the code. 😀 Don't know whether I totally understand what https://github.com/apache/data

Re: [PR] feat: Implement Spark unhex [datafusion-comet]

2024-05-04 Thread via GitHub
tshauck commented on PR #342: URL: https://github.com/apache/datafusion-comet/pull/342#issuecomment-2094372353 I think this is ready for review. I updated the `unhex` impl to be more faithful to Spark's (for odd-length inputs in particular), added better null handling, and added more tests

[PR] fix: Disable Comet shuffle with AQE coalesce partitions enabled [datafusion-comet]

2024-05-04 Thread via GitHub
viirya opened a new pull request, #380: URL: https://github.com/apache/datafusion-comet/pull/380 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes test

[I] Detected memory leak on Comet columnar shuffle when AQE coalesce partitions enabled [datafusion-comet]

2024-05-04 Thread via GitHub
viirya opened a new issue, #381: URL: https://github.com/apache/datafusion-comet/issues/381 ### Describe the bug There are a few test failures caused by memory leak reported by Java Arrow. They are found in #250 after enabling columnar shuffle by default on Spark SQL tests. For examp

Re: [PR] feat: Implement Spark unhex [datafusion-comet]

2024-05-04 Thread via GitHub
tshauck commented on PR #342: URL: https://github.com/apache/datafusion-comet/pull/342#issuecomment-2094451349 Err... looks to be an issue w/ spark 3.2 I'll need to look into. Hopefully the majority of the code'll remain unchanged. https://github.com/apache/datafusion-comet/assets/421

Re: [I] chore: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-04 Thread via GitHub
andygrove commented on issue #361: URL: https://github.com/apache/datafusion-comet/issues/361#issuecomment-2094518806 On the Rust side you will need a `match` statement to convert the protobuf i32 to the Rust enum (0 -> legacy, 1 -> try, 2 -> ansi). Perhaps take a look at how we handle one

Re: [I] chore: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-04 Thread via GitHub
andygrove commented on issue #361: URL: https://github.com/apache/datafusion-comet/issues/361#issuecomment-2094526194 @prashantksharma also, feel free to create a draft PR as it can be easier for maintainers to make suggestions on the PR -- This is an automated message from the Apache Gi

Re: [PR] UpdateD pool.rs [datafusion]

2024-05-04 Thread via GitHub
github-actions[bot] commented on PR #6943: URL: https://github.com/apache/datafusion/pull/6943#issuecomment-2094543878 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] Add `async` UDF example [datafusion]

2024-05-04 Thread via GitHub
github-actions[bot] commented on PR #6713: URL: https://github.com/apache/datafusion/pull/6713#issuecomment-2094543917 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [I] chore: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-04 Thread via GitHub
prashantksharma commented on issue #361: URL: https://github.com/apache/datafusion-comet/issues/361#issuecomment-2094569664 @andygrove Thank you so much for your feedback: > On the Rust side you will need a match statement to convert the protobuf i32 to the Rust enum (0 -> le

Re: [I] Support user defined display for UDF [datafusion]

2024-05-04 Thread via GitHub
yyy1000 commented on issue #10376: URL: https://github.com/apache/datafusion/issues/10376#issuecomment-2094626818 Looks like it doesn't need a lot of changes. Since I worked on another issue related to `display_name`, I want to take this. :) -- This is an automated message from the Apa

Re: [I] Support user defined display for UDF [datafusion]

2024-05-04 Thread via GitHub
yyy1000 commented on issue #10376: URL: https://github.com/apache/datafusion/issues/10376#issuecomment-2094634759 I'm working on it and I found after https://github.com/apache/datafusion/pull/10325 is merged, it will avoid a conflict. Given it's close to be merged, I think we can wait a mom

[PR] Move Sum aggregate function test to slt [datafusion]

2024-05-04 Thread via GitHub
jayzhan211 opened a new pull request, #10382: URL: https://github.com/apache/datafusion/pull/10382 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested