Re: [I] Proposal to Introduce Ray SQL into DataFusion Python [datafusion-python]

2024-09-16 Thread via GitHub
austin362667 commented on issue #872: URL: https://github.com/apache/datafusion-python/issues/872#issuecomment-2352212813 Thanks @ozankabak I'm curious on the boundary, too. One of the goals in [Ray SQL](https://github.com/datafusion-contrib/ray-sql?tab=readme-ov-file#goals) is "Drive r

[PR] Update substrait requirement from 0.41 to 0.42 [datafusion]

2024-09-16 Thread via GitHub
dependabot[bot] opened a new pull request, #12483: URL: https://github.com/apache/datafusion/pull/12483 Updates the requirements on [substrait](https://github.com/substrait-io/substrait-rs) to permit the latest version. Release notes Sourced from https://github.com/substrait-io/su

Re: [I] Proposal to Introduce Ray SQL into DataFusion Python [datafusion-python]

2024-09-16 Thread via GitHub
ozankabak commented on issue #872: URL: https://github.com/apache/datafusion-python/issues/872#issuecomment-2352408538 Indeed. Do you have an idea of how the subset would look like? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] test(substrait): update TPCH tests [datafusion]

2024-09-16 Thread via GitHub
Blizzara commented on PR #12462: URL: https://github.com/apache/datafusion/pull/12462#issuecomment-2352487802 LGTM, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] Make `required_guarantees ` output to be deterministic [datafusion]

2024-09-16 Thread via GitHub
austin362667 opened a new pull request, #12484: URL: https://github.com/apache/datafusion/pull/12484 ## Which issue does this PR close? Closes #12473. ## Rationale for this change Well explained in the issue. ## What changes are included in this PR?

Re: [I] first_value and last_value should have identical signatures [datafusion]

2024-09-16 Thread via GitHub
dmitrybugakov commented on issue #12376: URL: https://github.com/apache/datafusion/issues/12376#issuecomment-2352873780 @timsaucer Could you provide more details about what you mean by ‘nice to have identical function signatures’? From what I’ve reviewed in both the code and documentati

Re: [I] LargeList and List type coercion not working in `CASE WHEN` [datafusion]

2024-09-16 Thread via GitHub
goldmedal commented on issue #12370: URL: https://github.com/apache/datafusion/issues/12370#issuecomment-2352883021 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] minor: rename CometMetricNode `add` to `set` and update documentation [datafusion-comet]

2024-09-16 Thread via GitHub
andygrove merged PR #940: URL: https://github.com/apache/datafusion-comet/pull/940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] Proposal to Introduce Ray SQL into DataFusion Python [datafusion-python]

2024-09-16 Thread via GitHub
andygrove commented on issue #872: URL: https://github.com/apache/datafusion-python/issues/872#issuecomment-2352941183 To expand on this, there are several options for adopting the Ray SQL code as part of the DataFusion project. 1. Create new repo `datafusion-ray-sql` (possibly renam

Re: [PR] Improve SQLite subquery tables aliasing unparsing [datafusion]

2024-09-16 Thread via GitHub
dmitrybugakov commented on code in PR #12482: URL: https://github.com/apache/datafusion/pull/12482#discussion_r1761175453 ## datafusion/sql/src/unparser/rewrite.rs: ## @@ -258,8 +258,36 @@ pub(super) fn subquery_alias_inner_query_and_columns( (outer_projections.input.as_ref

Re: [I] [EPIC] Performance focus for 0.2.0 Release [datafusion-comet]

2024-09-16 Thread via GitHub
andygrove closed issue #717: [EPIC] Performance focus for 0.2.0 Release URL: https://github.com/apache/datafusion-comet/issues/717 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] `Utf8View` column produced incorrect result in a natural join query (SQLancer-NoREC) [datafusion]

2024-09-16 Thread via GitHub
dmitrybugakov commented on issue #12468: URL: https://github.com/apache/datafusion/issues/12468#issuecomment-2352964550 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] chore: Upgrade to DataFusion 42.0.0-rc1 [datafusion-comet]

2024-09-16 Thread via GitHub
andygrove opened a new pull request, #945: URL: https://github.com/apache/datafusion-comet/pull/945 ## Which issue does this PR close? N/A ## Rationale for this change Check for any regressions before the 42.0.0 release is finalized. ## What changes

[PR] chore: bump chrono to 0.4.38 [datafusion]

2024-09-16 Thread via GitHub
my-vegetable-has-exploded opened a new pull request, #12485: URL: https://github.com/apache/datafusion/pull/12485 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? I met some p

Re: [I] Proposal to Introduce Ray SQL into DataFusion Python [datafusion-python]

2024-09-16 Thread via GitHub
vakarisbk commented on issue #872: URL: https://github.com/apache/datafusion-python/issues/872#issuecomment-2353006349 100% yes. If Ray SQL can handle all TPC-H queries with just 1.7k lines of code, it’s sounds like a no-brainer. This actually makes a production-ready distributed DataFusio

Re: [PR] Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled [datafusion]

2024-09-16 Thread via GitHub
alamb commented on PR #12135: URL: https://github.com/apache/datafusion/pull/12135#issuecomment-2353009724 I plan to merge this tomorrow to give people time to comment / respond if desired Thanks again -- This is an automated message from the Apache Git Service. To respond to the

[PR] Update datafusion protobuf definitions [datafusion-ballista]

2024-09-16 Thread via GitHub
palaska opened a new pull request, #1057: URL: https://github.com/apache/datafusion-ballista/pull/1057 # Which issue does this PR close? Closes #. # Rationale for this change Keep things in sync with the currently used Datafusion version. # What changes ar

[PR] Remove deprecated config setup functions [datafusion]

2024-09-16 Thread via GitHub
findepi opened a new pull request, #12486: URL: https://github.com/apache/datafusion/pull/12486 These were deprecated since v 32. 🧹 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[PR] Remove deprecated ScalarUDF::new [datafusion]

2024-09-16 Thread via GitHub
findepi opened a new pull request, #12487: URL: https://github.com/apache/datafusion/pull/12487 It was deprecated since v 34. This also removes associated `ScalarUdfLegacyWrapper` supporting the removed function. Note that similar `SimpleScalarUDF` is retained, thus the functionality tha

[PR] Remove Arc wrapping from create_udf's return_type [datafusion]

2024-09-16 Thread via GitHub
findepi opened a new pull request, #12489: URL: https://github.com/apache/datafusion/pull/12489 The argument types are moved into `create_udf` so moving also `return_type` would increase API consistency. Internally, the `create_udf` unwrapped or cloned (so moves) the passed in return

[PR] Remove ScalarValue::Dictionary [datafusion]

2024-09-16 Thread via GitHub
findepi opened a new pull request, #12488: URL: https://github.com/apache/datafusion/pull/12488 `ScalarValue` should be a container for a single nullable logical type and should not be concerned by various physical encodings used in arrays. It doesn't involve arrays even as part of internal

[PR] Support List, FixedSizeList and LargeList type coercion for comparison [datafusion]

2024-09-16 Thread via GitHub
goldmedal opened a new pull request, #12490: URL: https://github.com/apache/datafusion/pull/12490 ## Which issue does this PR close? Closes #12370. ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [PR] Remove ScalarValue::Dictionary [datafusion]

2024-09-16 Thread via GitHub
findepi commented on PR #12488: URL: https://github.com/apache/datafusion/pull/12488#issuecomment-2353208973 This won't pass the test yet, but creating the PR to have a conversation first. cc @alamb @comphead especially if we go with @notfilippo's https://github.com/apache/datafusio

Re: [PR] Remove ScalarValue::Dictionary [datafusion]

2024-09-16 Thread via GitHub
notfilippo commented on PR #12488: URL: https://github.com/apache/datafusion/pull/12488#issuecomment-2353215551 Just posting it here for reference as this PR overlaps with some of the work in this other PR: https://github.com/apache/datafusion/pull/11978 -- This is an automated message f

Re: [PR] Support List, FixedSizeList and LargeList type coercion for comparison [datafusion]

2024-09-16 Thread via GitHub
goldmedal commented on code in PR #12490: URL: https://github.com/apache/datafusion/pull/12490#discussion_r1761372361 ## datafusion/sqllogictest/test_files/case.slt: ## @@ -108,3 +108,54 @@ SELECT CASE WHEN false THEN 1 ELSE 0 END FROM foo 0 0 0 + +# List(Utf8) will be casted

Re: [PR] Support List, FixedSizeList and LargeList type coercion for comparison [datafusion]

2024-09-16 Thread via GitHub
goldmedal commented on PR #12490: URL: https://github.com/apache/datafusion/pull/12490#issuecomment-2353251978 I tried to add some sql test for `FixedSizeList` case but I can't find the way to create it through SQL API 🤔 ``` > select arrow_cast([1,2,3], 'FixedSizeList(Int64, 10)');

Re: [PR] Improve SQLite subquery tables aliasing unparsing [datafusion]

2024-09-16 Thread via GitHub
sgrebnov commented on code in PR #12482: URL: https://github.com/apache/datafusion/pull/12482#discussion_r1761390609 ## datafusion/sql/src/unparser/rewrite.rs: ## @@ -258,8 +258,36 @@ pub(super) fn subquery_alias_inner_query_and_columns( (outer_projections.input.as_ref(), c

Re: [PR] Improve SQLite subquery tables aliasing unparsing [datafusion]

2024-09-16 Thread via GitHub
sgrebnov commented on code in PR #12482: URL: https://github.com/apache/datafusion/pull/12482#discussion_r1761390609 ## datafusion/sql/src/unparser/rewrite.rs: ## @@ -258,8 +258,36 @@ pub(super) fn subquery_alias_inner_query_and_columns( (outer_projections.input.as_ref(), c

[PR] Join reorder to support hash joins with projections [datafusion]

2024-09-16 Thread via GitHub
onursatici opened a new pull request, #12491: URL: https://github.com/apache/datafusion/pull/12491 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

[I] [substrait] TPCH Plan 15 Is Empty [datafusion]

2024-09-16 Thread via GitHub
vbarua opened a new issue, #12492: URL: https://github.com/apache/datafusion/issues/12492 ### Is your feature request related to a problem or challenge? Due to the upstream issue in https://github.com/substrait-io/consumer-testing/issues/108 the Substrait plan for TCPH query 15 is em

Re: [PR] test(substrait): update TPCH tests [datafusion]

2024-09-16 Thread via GitHub
vbarua commented on code in PR #12462: URL: https://github.com/apache/datafusion/pull/12462#discussion_r1761484267 ## datafusion/substrait/tests/cases/consumer_integration.rs: ## @@ -24,569 +24,435 @@ #[cfg(test)] mod tests { +use crate::utils::test::add_plan_schemas_to_

Re: [I] CometSparkToColumnar should have different name for row vs columnar input [datafusion-comet]

2024-09-16 Thread via GitHub
parthchandra commented on issue #936: URL: https://github.com/apache/datafusion-comet/issues/936#issuecomment-2353421719 Maybe just override the `simpleString` and `verboseString` methods? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Improve aggregation code readability [datafusion]

2024-09-16 Thread via GitHub
Rachelint commented on issue #12335: URL: https://github.com/apache/datafusion/issues/12335#issuecomment-2353434433 > This would be a valuable improvement. Now the execution behavior is determined by `AggregateMode` + other more subtle inner states like `spill_state.xxx`, and some functions

[I] [substrait] Add support for enum arguments [datafusion]

2024-09-16 Thread via GitHub
vbarua opened a new issue, #12493: URL: https://github.com/apache/datafusion/issues/12493 ### Is your feature request related to a problem or challenge? Substrait has [3 different types](https://substrait.io/expressions/scalar_functions/#argument-types) of arguments for functions. On

Re: [PR] test(substrait): update TPCH tests [datafusion]

2024-09-16 Thread via GitHub
vbarua commented on PR #12462: URL: https://github.com/apache/datafusion/pull/12462#issuecomment-2353444665 I've created the following to track the issues with the TPCH queries: * https://github.com/apache/datafusion/issues/12492 * https://github.com/apache/datafusion/issues/12493

Re: [PR] test(substrait): update TPCH tests [datafusion]

2024-09-16 Thread via GitHub
alamb commented on PR #12462: URL: https://github.com/apache/datafusion/pull/12462#issuecomment-2353456565 Thanks again @vbarua and @Blizzara -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] test(substrait): update TPCH tests [datafusion]

2024-09-16 Thread via GitHub
alamb merged PR #12462: URL: https://github.com/apache/datafusion/pull/12462 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore: Upgrade to DataFusion 42.0.0-rc1 [datafusion-comet]

2024-09-16 Thread via GitHub
andygrove commented on PR #945: URL: https://github.com/apache/datafusion-comet/pull/945#issuecomment-2353458871 I will update this PR to use the official DataFusion 42.0.0 release once it is available on crates.io (should be tomorrow). -- This is an automated message from the Apache Git

[I] DataFusion weekly project plan (Andrew Lamb) - Sep 16, 2024 [datafusion]

2024-09-16 Thread via GitHub
alamb opened a new issue, #12494: URL: https://github.com/apache/datafusion/issues/12494 Follow on to https://github.com/apache/datafusion/issues/12336 My (personal) North ⭐ : 1000 projects are built using DataFusion 📈 ( Getting closer all the time ) **It would be great for oth

Re: [I] Comet library not initializing [datafusion-comet]

2024-09-16 Thread via GitHub
andygrove closed issue #773: Comet library not initializing URL: https://github.com/apache/datafusion-comet/issues/773 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] Comet library not initializing [datafusion-comet]

2024-09-16 Thread via GitHub
andygrove commented on issue #773: URL: https://github.com/apache/datafusion-comet/issues/773#issuecomment-2353485069 It looks like this issue is inactive now, so I will close it. Feel free to reopen it @zelda89 if you still need assistance. -- This is an automated message from the Apach

Re: [I] 2024 Q3-Q4 Roadmap? [datafusion]

2024-09-16 Thread via GitHub
alamb commented on issue #11442: URL: https://github.com/apache/datafusion/issues/11442#issuecomment-2353490505 > I think especially in the RND world (industrial and academic), Datafusion makes research easier and more interesting, since you're starting from a already-present foundation an

Re: [I] 2024 Q3-Q4 Roadmap? [datafusion]

2024-09-16 Thread via GitHub
alamb commented on issue #11442: URL: https://github.com/apache/datafusion/issues/11442#issuecomment-2353494247 > I think especially in Europe, Datafusion is still not as well-known, and if more DBMS people were to know about it, it would be beneficial to the future of the project. I

[PR] feat: Publish artifacts to maven [datafusion-comet]

2024-09-16 Thread via GitHub
parthchandra opened a new pull request, #946: URL: https://github.com/apache/datafusion-comet/pull/946 ## Which issue does this PR close? Part of #721 Draft: version, built on top of #932, requires some additional setup in nexus before it will work. -- This is an automa

Re: [PR] feat: implement scripts for binary release build [datafusion-comet]

2024-09-16 Thread via GitHub
andygrove commented on code in PR #932: URL: https://github.com/apache/datafusion-comet/pull/932#discussion_r1761569452 ## dev/release/comet-rm/build-comet-native-libs.sh: ## @@ -0,0 +1,52 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or mor

Re: [PR] feat: implement scripts for binary release build [datafusion-comet]

2024-09-16 Thread via GitHub
andygrove commented on code in PR #932: URL: https://github.com/apache/datafusion-comet/pull/932#discussion_r1761569452 ## dev/release/comet-rm/build-comet-native-libs.sh: ## @@ -0,0 +1,52 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or mor

Re: [PR] feat: implement scripts for binary release build [datafusion-comet]

2024-09-16 Thread via GitHub
andygrove commented on code in PR #932: URL: https://github.com/apache/datafusion-comet/pull/932#discussion_r1761570516 ## dev/release/comet-rm/build-comet-native-libs.sh: ## @@ -0,0 +1,52 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or mor

Re: [PR] Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled [datafusion]

2024-09-16 Thread via GitHub
comphead commented on code in PR #12135: URL: https://github.com/apache/datafusion/pull/12135#discussion_r1761572859 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -789,19 +792,22 @@ impl TableProvider for ListingTable { .map(|col| Ok(self.table_schema.fi

Re: [PR] chore: Add config for enabling SMJ with join condition [datafusion-comet]

2024-09-16 Thread via GitHub
andygrove merged PR #937: URL: https://github.com/apache/datafusion-comet/pull/937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Remove LargeUtf8|Binary, Utf8|BinaryView, and Dictionary from ScalarValue [datafusion]

2024-09-16 Thread via GitHub
notfilippo commented on PR #11978: URL: https://github.com/apache/datafusion/pull/11978#issuecomment-2353519265 I'm back from vacation and I've rebased my PR to the latest upstream. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] DataFusion weekly project plan (Andrew Lamb) - Sep 9, 2024 [datafusion]

2024-09-16 Thread via GitHub
alamb closed issue #12391: DataFusion weekly project plan (Andrew Lamb) - Sep 9, 2024 URL: https://github.com/apache/datafusion/issues/12391 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] DataFusion weekly project plan (Andrew Lamb) - Sep 9, 2024 [datafusion]

2024-09-16 Thread via GitHub
alamb commented on issue #12391: URL: https://github.com/apache/datafusion/issues/12391#issuecomment-2353526427 Next week: https://github.com/apache/datafusion/issues/12494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] DataFusion weekly project plan (Andrew Lamb) - Sep 16, 2024 [datafusion]

2024-09-16 Thread via GitHub
alamb commented on issue #12494: URL: https://github.com/apache/datafusion/issues/12494#issuecomment-2353527744 Review Queue DataFusion (bugs / improvements) - [ ] DataFusion (performance): - [ ] https://github.com/apache/datafusion/pull/12395 - [ ] https://github.com/

Re: [PR] feat: Support null safe equals in ExtractEquijoinPredicate [datafusion]

2024-09-16 Thread via GitHub
eejbyfeldt commented on code in PR #12458: URL: https://github.com/apache/datafusion/pull/12458#discussion_r1761592076 ## datafusion/optimizer/src/eliminate_cross_join.rs: ## @@ -96,6 +96,7 @@ impl OptimizerRule for EliminateCrossJoin { filter.input.as_ref(),

Re: [PR] feat: Support null safe equals in ExtractEquijoinPredicate [datafusion]

2024-09-16 Thread via GitHub
eejbyfeldt commented on code in PR #12458: URL: https://github.com/apache/datafusion/pull/12458#discussion_r1761592476 ## datafusion/sqllogictest/test_files/join.slt: ## @@ -766,6 +766,50 @@ set datafusion.execution.target_partitions = 4; statement ok set datafusion.optimizer.

Re: [PR] feat: Support null safe equals in ExtractEquijoinPredicate [datafusion]

2024-09-16 Thread via GitHub
eejbyfeldt commented on PR #12458: URL: https://github.com/apache/datafusion/pull/12458#issuecomment-2353535642 Converted to draft until I have time to address the feedback and look more into if the requires changes in other rules. -- This is an automated message from the Apache Git Servi

Re: [PR] fix: CometScanExec on Spark 3.5.2 [datafusion-comet]

2024-09-16 Thread via GitHub
parthchandra commented on code in PR #915: URL: https://github.com/apache/datafusion-comet/pull/915#discussion_r1761594517 ## spark/src/main/scala/org/apache/spark/sql/comet/CometScanExec.scala: ## @@ -141,8 +141,30 @@ case class CometScanExec( if (wrapped == null) Map.empt

Re: [PR] feat: implement scripts for binary release build [datafusion-comet]

2024-09-16 Thread via GitHub
parthchandra commented on code in PR #932: URL: https://github.com/apache/datafusion-comet/pull/932#discussion_r1761600209 ## dev/release/build-release-comet.sh: ## @@ -0,0 +1,168 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contrib

Re: [PR] Add "Extended Clickbench" benchmark for median and approx_median for high cardinality aggregates [datafusion]

2024-09-16 Thread via GitHub
alamb merged PR #12438: URL: https://github.com/apache/datafusion/pull/12438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add "Extended Clickbench" benchmark for median and approx_median for high cardinality aggregates [datafusion]

2024-09-16 Thread via GitHub
alamb commented on PR #12438: URL: https://github.com/apache/datafusion/pull/12438#issuecomment-2353557493 Thank you for the review @korowa -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] feat(planner): Allowing setting sort order of parquet files without specifying the schema [datafusion]

2024-09-16 Thread via GitHub
alamb commented on code in PR #12466: URL: https://github.com/apache/datafusion/pull/12466#discussion_r1761626074 ## datafusion/sql/src/statement.rs: ## @@ -1028,8 +1030,26 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { .into_iter() .collect(); -

Re: [PR] return absent stats when filters are pushed down [datafusion]

2024-09-16 Thread via GitHub
alamb commented on code in PR #12471: URL: https://github.com/apache/datafusion/pull/12471#discussion_r1761635636 ## datafusion/core/src/datasource/physical_plan/parquet/mod.rs: ## @@ -739,7 +739,17 @@ impl ExecutionPlan for ParquetExec { } fn statistics(&self) -> Re

Re: [PR] return absent stats when filters are pushed down [datafusion]

2024-09-16 Thread via GitHub
alamb commented on code in PR #12471: URL: https://github.com/apache/datafusion/pull/12471#discussion_r1761644063 ## datafusion/core/tests/sql/path_partition.rs: ## @@ -491,7 +491,22 @@ async fn parquet_statistics() -> Result<()> { // stats for the first col are read from t

Re: [PR] Add `array_dot_product` / `list_dot_product` function [datafusion]

2024-09-16 Thread via GitHub
alamb commented on PR #12476: URL: https://github.com/apache/datafusion/pull/12476#issuecomment-2353611810 > Thank you, @dharanad , for bringing this to my attention. This is a great discussion. I like the idea of keeping the DataFusion core as simple as possible while retaining useful Duck

Re: [PR] Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled [datafusion]

2024-09-16 Thread via GitHub
itsjunetime commented on code in PR #12135: URL: https://github.com/apache/datafusion/pull/12135#discussion_r1761651689 ## datafusion/core/src/datasource/schema_adapter.rs: ## @@ -167,55 +186,95 @@ impl SchemaAdapter for DefaultSchemaAdapter { /// The SchemaMapping struct hol

Re: [I] [Epic] A collection of issues for extending the Aggregation function [datafusion]

2024-09-16 Thread via GitHub
alamb commented on issue #12254: URL: https://github.com/apache/datafusion/issues/12254#issuecomment-2353615046 @Weijun-H and @dmitrybugakov and @dharanad -- what do you think about creating a `datafusion-functions-duckdb` repo in datafusion-contrib similar to https://github.com/datafusi

Re: [I] Support `max_by` in Aggregation function [datafusion]

2024-09-16 Thread via GitHub
alamb commented on issue #12252: URL: https://github.com/apache/datafusion/issues/12252#issuecomment-2353615431 Suggestion: https://github.com/apache/datafusion/issues/12254#issuecomment-2353615046 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] Support `min_by` in Aggregation function [datafusion]

2024-09-16 Thread via GitHub
alamb commented on issue #12253: URL: https://github.com/apache/datafusion/issues/12253#issuecomment-2353615568 Suggestion: https://github.com/apache/datafusion/issues/12254#issuecomment-2353615046 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] Support `kurtosis(x)` in Aggregation function [datafusion]

2024-09-16 Thread via GitHub
alamb commented on issue #12250: URL: https://github.com/apache/datafusion/issues/12250#issuecomment-2353615661 Suggestion: https://github.com/apache/datafusion/issues/12254#issuecomment-2353615046 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] Support `mode` in Aggregation function [datafusion]

2024-09-16 Thread via GitHub
alamb commented on issue #12248: URL: https://github.com/apache/datafusion/issues/12248#issuecomment-2353615819 Suggestion: https://github.com/apache/datafusion/issues/12254#issuecomment-2353615046 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] Support` skewness(x) ` in Aggregation function [datafusion]

2024-09-16 Thread via GitHub
alamb commented on issue #12249: URL: https://github.com/apache/datafusion/issues/12249#issuecomment-2353615736 Suggestion: https://github.com/apache/datafusion/issues/12254#issuecomment-2353615046 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Implement mode function [datafusion]

2024-09-16 Thread via GitHub
alamb commented on PR #12385: URL: https://github.com/apache/datafusion/pull/12385#issuecomment-2353616053 RElated comment: https://github.com/apache/datafusion/issues/12254#issuecomment-2353615046 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] Support `entropy` in Aggregation function [datafusion]

2024-09-16 Thread via GitHub
alamb commented on issue #12247: URL: https://github.com/apache/datafusion/issues/12247#issuecomment-2353616212 Suggestion: https://github.com/apache/datafusion/issues/12254#issuecomment-2353615046 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Refactor to support recursive unnest in physical plan [datafusion]

2024-09-16 Thread via GitHub
duongcongtoai commented on code in PR #11577: URL: https://github.com/apache/datafusion/pull/11577#discussion_r1761654336 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -2921,17 +2922,53 @@ pub enum Partitioning { DistributeBy(Vec), } +#[derive(Debug, Clone, PartialE

Re: [PR] Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled [datafusion]

2024-09-16 Thread via GitHub
itsjunetime commented on code in PR #12135: URL: https://github.com/apache/datafusion/pull/12135#discussion_r1761656865 ## datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt: ## @@ -81,16 +81,11 @@ EXPLAIN select a from t_pushdown where b > 2 ORDER BY a; logic

Re: [PR] Improve doc wording around scalar authoring [datafusion]

2024-09-16 Thread via GitHub
alamb merged PR #12478: URL: https://github.com/apache/datafusion/pull/12478 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] date_trunc small update for readability [datafusion]

2024-09-16 Thread via GitHub
alamb merged PR #12479: URL: https://github.com/apache/datafusion/pull/12479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] cleanup `array_has` [datafusion]

2024-09-16 Thread via GitHub
alamb commented on PR #12460: URL: https://github.com/apache/datafusion/pull/12460#issuecomment-2353629155 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] cleanup `array_has` [datafusion]

2024-09-16 Thread via GitHub
alamb merged PR #12460: URL: https://github.com/apache/datafusion/pull/12460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] cleanup `array_has` [datafusion]

2024-09-16 Thread via GitHub
alamb commented on PR #12460: URL: https://github.com/apache/datafusion/pull/12460#issuecomment-2353629437 Thanks @samuelcolvin and @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] chore: bump chrono to 0.4.38 [datafusion]

2024-09-16 Thread via GitHub
alamb merged PR #12485: URL: https://github.com/apache/datafusion/pull/12485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Remove deprecated ScalarUDF::new [datafusion]

2024-09-16 Thread via GitHub
alamb merged PR #12487: URL: https://github.com/apache/datafusion/pull/12487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Remove deprecated config setup functions [datafusion]

2024-09-16 Thread via GitHub
alamb merged PR #12486: URL: https://github.com/apache/datafusion/pull/12486 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled [datafusion]

2024-09-16 Thread via GitHub
itsjunetime commented on code in PR #12135: URL: https://github.com/apache/datafusion/pull/12135#discussion_r1761672152 ## datafusion/core/src/datasource/schema_adapter.rs: ## @@ -167,55 +186,95 @@ impl SchemaAdapter for DefaultSchemaAdapter { /// The SchemaMapping struct hol

Re: [PR] Add `array_dot_product` / `list_dot_product` function [datafusion]

2024-09-16 Thread via GitHub
austin362667 commented on PR #12476: URL: https://github.com/apache/datafusion/pull/12476#issuecomment-2353650381 Sure, thank you Andrew for proposing this initiative. I like the idea. Let's do it this way!! -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Make make_scalar_function() result candidate for inlining, by removing the `Arc` [datafusion]

2024-09-16 Thread via GitHub
comphead commented on PR #12477: URL: https://github.com/apache/datafusion/pull/12477#issuecomment-2353650988 > Makes sense to me -- thank you @findepi > > What do you think about removing `ScalarFunctionImplementation` entirely? Or is it still important to have a typedef around the `

Re: [PR] Update substrait requirement from 0.41 to 0.42, `prost-build` to `0.13.2` [datafusion]

2024-09-16 Thread via GitHub
alamb commented on code in PR #12483: URL: https://github.com/apache/datafusion/pull/12483#discussion_r1761675106 ## datafusion/proto/gen/Cargo.toml: ## @@ -35,4 +35,4 @@ workspace = true [dependencies] # Pin these dependencies so that the generated output is deterministic pb

Re: [I] [Epic] A collection of issues for extending the Aggregation function [datafusion]

2024-09-16 Thread via GitHub
austin362667 commented on issue #12254: URL: https://github.com/apache/datafusion/issues/12254#issuecomment-2353657029 Thank you @alamb for proposing this initiative. I like this idea. What about others' thought? It clearly draws a line between the `core` and the `extensions`. And we can

Re: [PR] implement max/min_by aggregate function [datafusion]

2024-09-16 Thread via GitHub
alamb commented on PR #12284: URL: https://github.com/apache/datafusion/pull/12284#issuecomment-2353671852 I wonder if this would be a good candidate to start building a new `datafusion-functions-spark` crate in https://github.com/datafusion-contrib? It would be really neat if we could star

Re: [PR] PartialOrd for Expr and sub fields/structs [datafusion]

2024-09-16 Thread via GitHub
alamb commented on code in PR #12481: URL: https://github.com/apache/datafusion/pull/12481#discussion_r1761697129 ## datafusion/expr/src/logical_plan/ddl.rs: ## @@ -284,6 +346,15 @@ pub struct CreateCatalogSchema { pub schema: DFSchemaRef, } +impl PartialOrd for CreateCa

Re: [PR] feat(planner): Allowing setting sort order of parquet files without specifying the schema [datafusion]

2024-09-16 Thread via GitHub
devanbenz commented on code in PR #12466: URL: https://github.com/apache/datafusion/pull/12466#discussion_r1761713102 ## datafusion/sql/src/statement.rs: ## @@ -1028,8 +1030,26 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { .into_iter() .collect();

Re: [PR] Remove unnecessary shifts in gcd() [datafusion]

2024-09-16 Thread via GitHub
Dandandan merged PR #12480: URL: https://github.com/apache/datafusion/pull/12480 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Improve SQLite subquery tables aliasing unparsing [datafusion]

2024-09-16 Thread via GitHub
alamb commented on PR #12482: URL: https://github.com/apache/datafusion/pull/12482#issuecomment-2353701964 cc @phillipleblanc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Remove unnecessary shifts in gcd() [datafusion]

2024-09-16 Thread via GitHub
Dandandan commented on PR #12480: URL: https://github.com/apache/datafusion/pull/12480#issuecomment-2353700579 Thank you @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Fix: check ambiguous column reference [datafusion]

2024-09-16 Thread via GitHub
alamb commented on code in PR #12467: URL: https://github.com/apache/datafusion/pull/12467#discussion_r1761723046 ## datafusion/sql/src/expr/identifier.rs: ## @@ -186,7 +186,22 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { let s = &ids[0..ids.len

Re: [PR] Specialize ASCII case for substr() [datafusion]

2024-09-16 Thread via GitHub
alamb commented on code in PR #12444: URL: https://github.com/apache/datafusion/pull/12444#discussion_r1761725094 ## datafusion/functions/src/unicode/substr.rs: ## @@ -186,6 +202,53 @@ fn make_and_append_view( null_builder.append_non_null(); } +// String characters are v

Re: [PR] Improve `trim` for string view [datafusion]

2024-09-16 Thread via GitHub
alamb commented on code in PR #12395: URL: https://github.com/apache/datafusion/pull/12395#discussion_r1761764700 ## datafusion/functions/src/string/ltrim.rs: ## @@ -81,7 +81,11 @@ impl ScalarUDFImpl for LtrimFunc { } fn return_type(&self, arg_types: &[DataType]) ->

Re: [PR] Implement SHOW FUNCTIONS [datafusion]

2024-09-16 Thread via GitHub
matthewmturner commented on PR #12266: URL: https://github.com/apache/datafusion/pull/12266#issuecomment-2353747651 Im not sure the implementation details that get descriptions / signatures to flow through here (so maybe this is already handled) but I think it would be cool if functions reg

Re: [PR] fix: coalesce schema issues [datafusion]

2024-09-16 Thread via GitHub
alamb commented on code in PR #12308: URL: https://github.com/apache/datafusion/pull/12308#discussion_r1761776248 ## datafusion/functions/src/encoding/inner.rs: ## @@ -49,17 +48,8 @@ impl Default for EncodeFunc { impl EncodeFunc { pub fn new() -> Self { -use Data

Re: [PR] feat: implement scripts for binary release build [datafusion-comet]

2024-09-16 Thread via GitHub
andygrove commented on PR #932: URL: https://github.com/apache/datafusion-comet/pull/932#issuecomment-2353772043 I ran the scripts locally and they seem to have worked. I ran this command: ``` ./dev/release/build-release-comet.sh -r https://github.com/parthchandra/datafusio

  1   2   >