Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1901321935 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1567,17 +1585,41 @@ pub fn write_ipc_compressed( let mut timer = ipc_time.timer(); l

[I] supports_filters_pushdown is invoked more than once on a single Custom Data Source [datafusion]

2025-01-02 Thread via GitHub
cisaacson opened a new issue, #13994: URL: https://github.com/apache/datafusion/issues/13994 ### Describe the bug The `supports_filters_pushdown` fn is invoked more than once on the same Custom Data Source for a query. Further, it is called with different `filters` list, so it is not

Re: [PR] Update petgraph requirement from 0.6.2 to 0.7.0 [datafusion]

2025-01-02 Thread via GitHub
jonahgao commented on code in PR #13964: URL: https://github.com/apache/datafusion/pull/13964#discussion_r1901427576 ## datafusion-cli/Cargo.lock: ## @@ -2395,12 +2395,12 @@ dependencies = [ [[package]] name = "indexmap" -version = "2.7.0" +version = "2.5.0" Review Comment:

Re: [PR] doc-gen: migrate scalar functions (encoding & regex) documentation [datafusion]

2025-01-02 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #13919: URL: https://github.com/apache/datafusion/pull/13919#discussion_r1901429188 ## datafusion/functions/src/encoding/inner.rs: ## @@ -126,10 +124,21 @@ impl ScalarUDFImpl for EncodeFunc { } fn documentation(&self) -> Option<&

[I] TPCDS 49, 70, 72, 86 Failed [datafusion]

2025-01-02 Thread via GitHub
djouallah opened a new issue, #13993: URL: https://github.com/apache/datafusion/issues/13993 ### Describe the bug from the 99 queries, 4 queries still fail ### To Reproduce https://drive.google.com/file/d/1NmrDyFRVNSCCsatwm77sery_s0yCaB-1/view?usp=sharing ### Expe

Re: [PR] feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on code in PR #1021: URL: https://github.com/apache/datafusion-comet/pull/1021#discussion_r1901409389 ## spark/src/main/scala/org/apache/comet/CometExecIterator.scala: ## @@ -84,6 +87,30 @@ class CometExecIterator( private var currentBatch: ColumnarBatch =

Re: [PR] feat: add `AsyncCatalogProvider` helpers for asynchronous catalogs [datafusion]

2025-01-02 Thread via GitHub
westonpace commented on PR #13800: URL: https://github.com/apache/datafusion/pull/13800#issuecomment-2568592701 @alamb I've rebased and updated the example. I think the only remaining issue is your comment here: > When trying this API out I didn't fully understand this API (or what i

Re: [PR] feat: Reenable tests for filtered SMJ anti join [datafusion-comet]

2025-01-02 Thread via GitHub
comphead commented on PR #1211: URL: https://github.com/apache/datafusion-comet/pull/1211#issuecomment-2568592582 > > Hmm, so it is correctness issue too? > > The correctness tests in CI seem to be passing so far .. I wonder if the SMJ is producing lots of empty batches? That would e

Re: [PR] feat: Reenable tests for filtered SMJ anti join [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on PR #1211: URL: https://github.com/apache/datafusion-comet/pull/1211#issuecomment-2568589066 Here is a PR against this PR to add the `CoalesceBatchesExec`: https://github.com/comphead/arrow-datafusion-comet/pull/1 -- This is an automated message from the Apache Git

Re: [PR] feat: Reenable tests for filtered SMJ anti join [datafusion-comet]

2025-01-02 Thread via GitHub
comphead commented on PR #1211: URL: https://github.com/apache/datafusion-comet/pull/1211#issuecomment-2568596132 > Something odd is going on. I have more data that will maybe help us understand this. > > I ran q21 with the code in this PR and then again with all SMJs wrapped in a `

Re: [PR] Update itertools requirement from 0.13 to 0.14 [datafusion]

2025-01-02 Thread via GitHub
jonahgao merged PR #13965: URL: https://github.com/apache/datafusion/pull/13965 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-02 Thread via GitHub
comphead commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1901414056 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1567,17 +1585,41 @@ pub fn write_ipc_compressed( let mut timer = ipc_time.timer(); le

[PR] Refactor spill handling in GroupedHashAggregateStream to use partial … [datafusion]

2025-01-02 Thread via GitHub
kosiew opened a new pull request, #13995: URL: https://github.com/apache/datafusion/pull/13995 ## Which issue does this PR close? Closes #13949. ## Rationale for this change When an aggregation operator spills intermediate (partial) state to disk, it need

[PR] add support for lists in min [datafusion]

2025-01-02 Thread via GitHub
rluvaton opened a new pull request, #13991: URL: https://github.com/apache/datafusion/pull/13991 ## Which issue does this PR close? Closes #13987. ## Rationale for this change You are now able to run min and max on lists ## What changes are included in this PR?

Re: [I] [EPIC] Run full sqllogic / sqlite test suite against DataFusion [datafusion]

2025-01-02 Thread via GitHub
alamb commented on issue #13811: URL: https://github.com/apache/datafusion/issues/13811#issuecomment-2568357896 Thank you @2010YOUY01 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-02 Thread via GitHub
mbutrovich commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1901247700 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1650,18 +1719,44 @@ mod test { #[test] #[cfg_attr(miri, ignore)] // miri can't ca

Re: [PR] feat: Reenable tests for filtered SMJ anti join [datafusion-comet]

2025-01-02 Thread via GitHub
viirya commented on code in PR #1211: URL: https://github.com/apache/datafusion-comet/pull/1211#discussion_r1901368239 ## spark/src/test/scala/org/apache/comet/exec/CometJoinSuite.scala: ## @@ -391,9 +391,6 @@ class CometJoinSuite extends CometTestBase { "AND tbl_

Re: [PR] feat: Add regexp_split_to_array function [datafusion]

2025-01-02 Thread via GitHub
github-actions[bot] closed pull request #13110: feat: Add regexp_split_to_array function URL: https://github.com/apache/datafusion/pull/13110 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Do not push down filter through distinct on [datafusion]

2025-01-02 Thread via GitHub
github-actions[bot] commented on PR #12943: URL: https://github.com/apache/datafusion/pull/12943#issuecomment-2568604118 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] [WIP] Make ListingTableUrl allow direct construction [datafusion]

2025-01-02 Thread via GitHub
github-actions[bot] closed pull request #12981: [WIP] Make ListingTableUrl allow direct construction URL: https://github.com/apache/datafusion/pull/12981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Improve deserialize_to_struct example [datafusion]

2025-01-02 Thread via GitHub
jonahgao commented on code in PR #13958: URL: https://github.com/apache/datafusion/pull/13958#discussion_r1901418145 ## datafusion-examples/examples/deserialize_to_struct.rs: ## @@ -15,62 +15,136 @@ // specific language governing permissions and limitations // under the Licens

Re: [PR] feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove merged PR #1021: URL: https://github.com/apache/datafusion-comet/pull/1021 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Ballista 43.0.0 Release [datafusion-ballista]

2025-01-02 Thread via GitHub
milenkovicm commented on issue #974: URL: https://github.com/apache/datafusion-ballista/issues/974#issuecomment-2568306377 @andygrove should we release 43? I'm not sure if python release is in place, personally I don't see the point to release py-ballista in the current state. wdyt? --

[PR] Another test of github action [datafusion]

2025-01-02 Thread via GitHub
alamb opened a new pull request, #13992: URL: https://github.com/apache/datafusion/pull/13992 Actual PR to test https://github.com/apache/datafusion/pull/13988 on my private fork of DataFusion -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Another test of github action [datafusion]

2025-01-02 Thread via GitHub
alamb closed pull request #13992: Another test of github action URL: https://github.com/apache/datafusion/pull/13992 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Move hash collision test to run only when merging to main [datafusion]

2025-01-02 Thread via GitHub
alamb commented on PR #13973: URL: https://github.com/apache/datafusion/pull/13973#issuecomment-2568331788 > > We want a blocking check on the merge so on failure merge fails. Perhaps it should run on approval? > > That is a great idea I thought about this some more. I think th

Re: [PR] feat: support `RightAnti` for `SortMergeJoin` [datafusion]

2025-01-02 Thread via GitHub
comphead commented on code in PR #13680: URL: https://github.com/apache/datafusion/pull/13680#discussion_r1901261200 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -2910,6 +2992,310 @@ mod tests { Ok(()) } +#[tokio::test] +async fn join_r

Re: [PR] feat: add support for array_remove expression [datafusion-comet]

2025-01-02 Thread via GitHub
dharanad commented on code in PR #1179: URL: https://github.com/apache/datafusion-comet/pull/1179#discussion_r1901261435 ## native/core/src/execution/planner.rs: ## @@ -719,6 +720,24 @@ impl PhysicalPlanner { expr.legacy_negative_index, )))

Re: [I] Update ballista logo [datafusion-ballista]

2025-01-02 Thread via GitHub
andygrove commented on issue #1133: URL: https://github.com/apache/datafusion-ballista/issues/1133#issuecomment-2568436692 > Should we conclude voting here? if i count correctly new no.3 has a bit more votes ? Sounds good to me -- This is an automated message from the Apache Git S

Re: [PR] feat: rand expression support [datafusion-comet]

2025-01-02 Thread via GitHub
comphead commented on code in PR #1199: URL: https://github.com/apache/datafusion-comet/pull/1199#discussion_r1901255361 ## native/core/src/execution/jni_api.rs: ## @@ -317,7 +317,7 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_executePlan( // query pla

Re: [I] FFI Execution Plans that spawn threads panic [datafusion]

2025-01-02 Thread via GitHub
timsaucer commented on issue #13851: URL: https://github.com/apache/datafusion/issues/13851#issuecomment-2568369016 I can definitely help and I have a PR to fix the spawn issue. I will be back to work on this next week if you can give me a few dats -- This is an automated message from the

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-02 Thread via GitHub
mbutrovich commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1901249341 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1570,13 +1588,41 @@ pub fn write_ipc_compressed( // write ipc_length placeholder ou

Re: [PR] feat: Reenable tests for filtered SMJ anti join [datafusion-comet]

2025-01-02 Thread via GitHub
comphead commented on code in PR #1211: URL: https://github.com/apache/datafusion-comet/pull/1211#discussion_r1901389792 ## spark/src/test/scala/org/apache/comet/exec/CometJoinSuite.scala: ## @@ -391,9 +391,6 @@ class CometJoinSuite extends CometTestBase { "AND tb

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-02 Thread via GitHub
mbutrovich commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1901279652 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1570,13 +1588,41 @@ pub fn write_ipc_compressed( // write ipc_length placeholder ou

Re: [I] Update ballista logo [datafusion-ballista]

2025-01-02 Thread via GitHub
tbar4 commented on issue #1133: URL: https://github.com/apache/datafusion-ballista/issues/1133#issuecomment-2568495900 I think we should concludeTrevor ***@***.*** Jan 2, 2025, at 2:03 PM, Andy Grove ***@***.***> wrote: Should we conclude voting here? if i count correctly new no.3 h

Re: [I] Update ballista logo [datafusion-ballista]

2025-01-02 Thread via GitHub
andygrove commented on issue #1133: URL: https://github.com/apache/datafusion-ballista/issues/1133#issuecomment-2568499680 Thanks, everyone. It looks like we have chosen the new number 3. @pinarbayata Would you mind creating a PR to add the image files to the repo? -- This is an au

[PR] preserve sql formatting through a parse + display roundtrip (partial implementation) [datafusion-sqlparser-rs]

2025-01-02 Thread via GitHub
lovasoa opened a new pull request, #1636: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1636 this implements (a tiny portion of) https://github.com/apache/datafusion-sqlparser-rs/issues/1634 pros: really useful when passing formatted queries to a real database, in order

Re: [PR] feat: Reenable tests for filtered SMJ anti join [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on PR #1211: URL: https://github.com/apache/datafusion-comet/pull/1211#issuecomment-2568506081 I don't think that this PR closes https://github.com/apache/datafusion-comet/issues/398 since we still fall back to Spark for SMJ with join condition unless `spark.comet.exec

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-02 Thread via GitHub
comphead commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1901365468 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1567,17 +1585,41 @@ pub fn write_ipc_compressed( let mut timer = ipc_time.timer(); le

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1901370341 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1567,17 +1585,41 @@ pub fn write_ipc_compressed( let mut timer = ipc_time.timer(); l

Re: [PR] feat: Reenable tests for filtered SMJ anti join [datafusion-comet]

2025-01-02 Thread via GitHub
codecov-commenter commented on PR #1211: URL: https://github.com/apache/datafusion-comet/pull/1211#issuecomment-2568710371 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1211?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Release DataFusion `44.0.0` [datafusion]

2025-01-02 Thread via GitHub
niebayes commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2567440558 @alamb Hi, is the release for datafusion-cli delayed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[I] datafusion-cli returns error prefix twice [datafusion]

2025-01-02 Thread via GitHub
niebayes opened a new issue, #13979: URL: https://github.com/apache/datafusion/issues/13979 ### Describe the bug ``` > select version(); +---+ | version() | +---

[PR] Update substrait requirement from 0.50 to 0.51 [datafusion]

2025-01-02 Thread via GitHub
dependabot[bot] opened a new pull request, #13978: URL: https://github.com/apache/datafusion/pull/13978 Updates the requirements on [substrait](https://github.com/substrait-io/substrait-rs) to permit the latest version. Release notes Sourced from https://github.com/substrait-io/su

[PR] Update rstest requirement from 0.23.0 to 0.24.0 [datafusion]

2025-01-02 Thread via GitHub
dependabot[bot] opened a new pull request, #13977: URL: https://github.com/apache/datafusion/pull/13977 Updates the requirements on [rstest](https://github.com/la10736/rstest) to permit the latest version. Release notes Sourced from https://github.com/la10736/rstest/releases";>rste

Re: [I] Incorrect `NULL` handling in `BETWEEN` expression [datafusion]

2025-01-02 Thread via GitHub
getChan commented on issue #13976: URL: https://github.com/apache/datafusion/issues/13976#issuecomment-2567420902 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. [datafusion-comet]

2025-01-02 Thread via GitHub
Kontinuation commented on code in PR #1021: URL: https://github.com/apache/datafusion-comet/pull/1021#discussion_r1901394431 ## spark/src/main/scala/org/apache/comet/CometExecIterator.scala: ## @@ -84,6 +87,30 @@ class CometExecIterator( private var currentBatch: ColumnarBatc

Re: [PR] feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. [datafusion-comet]

2025-01-02 Thread via GitHub
Kontinuation commented on code in PR #1021: URL: https://github.com/apache/datafusion-comet/pull/1021#discussion_r1901396076 ## native/core/src/execution/jni_api.rs: ## @@ -407,6 +560,20 @@ pub extern "system" fn Java_org_apache_comet_Native_releasePlan( ) { try_unwrap_or

Re: [PR] feat: Reenable tests for filtered SMJ anti join [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on PR #1211: URL: https://github.com/apache/datafusion-comet/pull/1211#issuecomment-2568530421 I am running benchmarks with this PR with `spark.comet.exec.sortMergeJoinWithJoinFilter.enabled=true`, and there appears to be a serious performance issue. TPC-H q21 has been

Re: [PR] feat: Reenable tests for filtered SMJ anti join [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on PR #1211: URL: https://github.com/apache/datafusion-comet/pull/1211#issuecomment-2568535644 > Hmm, so it is correctness issue too? The correctness tests in CI seem to be passing so far .. I wonder if the SMJ is producing lots of empty batches? That would explai

[I] Add encoding + compression metrics to columnar shuffle [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove opened a new issue, #1212: URL: https://github.com/apache/datafusion-comet/issues/1212 ### What is the problem the feature request solves? We track encoding metrics for native shuffle but not yet for columnar shuffle: ![2025-01-02_16-39](https://github.com/user-attac

Re: [PR] feat: Reenable tests for filtered SMJ anti join [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on PR #1211: URL: https://github.com/apache/datafusion-comet/pull/1211#issuecomment-2568533415 > I am running benchmarks with this PR with `spark.comet.exec.sortMergeJoinWithJoinFilter.enabled=true`, and there appears to be a serious performance issue. TPC-H q21 has bee

Re: [PR] feat: Reenable tests for filtered SMJ anti join [datafusion-comet]

2025-01-02 Thread via GitHub
viirya commented on PR #1211: URL: https://github.com/apache/datafusion-comet/pull/1211#issuecomment-2568533756 Due to the performance issue, we cannot make COMET_EXEC_SORT_MERGE_JOIN_WITH_JOIN_FILTER_ENABLED as true by default. We can only enable it for particular tests. -- This is an

Re: [PR] feat: Reenable tests for filtered SMJ anti join [datafusion-comet]

2025-01-02 Thread via GitHub
viirya commented on PR #1211: URL: https://github.com/apache/datafusion-comet/pull/1211#issuecomment-2568534245 Hmm, so it is correctness issue too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: Reenable tests for filtered SMJ anti join [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on PR #1211: URL: https://github.com/apache/datafusion-comet/pull/1211#issuecomment-2568581478 Something odd is going on. I have more data that will maybe help us understand this. I ran q21 with the code in this PR and then again with all SMJs wrapped in a `Coale

Re: [PR] Optimize CASE expression for "expr or expr" usage. [datafusion]

2025-01-02 Thread via GitHub
aweltsch commented on PR #13953: URL: https://github.com/apache/datafusion/pull/13953#issuecomment-2567580400 Thanks for your feedback @alamb, I have added a new .slt test case in the file you mentioned. From my POV it should cover all relevant cases for the predicate (true, false, null) wi

Re: [PR] Optimize CASE expression for "expr or expr" usage. [datafusion]

2025-01-02 Thread via GitHub
2010YOUY01 commented on code in PR #13953: URL: https://github.com/apache/datafusion/pull/13953#discussion_r1900793486 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -394,6 +401,43 @@ impl CaseExpr { Ok(ColumnarValue::Array(zip(&when_value, &then_value, &e

[PR] Consolidate csv_opener.rs and json_opener.rs into a single example (#… [datafusion]

2025-01-02 Thread via GitHub
cj-zhukov opened a new pull request, #13981: URL: https://github.com/apache/datafusion/pull/13981 …13955) ## Which issue does this PR close? Closes #13955. ## Rationale for this change ## What changes are included in this PR? ## Are t

Re: [PR] Add swap_inputs to SMJ [datafusion]

2025-01-02 Thread via GitHub
berkaysynnada merged PR #13984: URL: https://github.com/apache/datafusion/pull/13984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Find a way to communicate the ordering of a file back with the existi… [datafusion]

2025-01-02 Thread via GitHub
zhuqi-lucas commented on PR #13933: URL: https://github.com/apache/datafusion/pull/13933#issuecomment-2567943900 > Hi @zhuqi-lucas -- sorry if we caused confusion here. I agree with @berkaysynnada and @ozankabak that ordering information is already represented in plans using [`EquivalenceP

Re: [I] Update supported Spark and Java versions in installation guide [datafusion-comet]

2025-01-02 Thread via GitHub
hayman42 commented on issue #742: URL: https://github.com/apache/datafusion-comet/issues/742#issuecomment-2567949641 Hi @zemin-piao, Just to let you know in case you still have issues. I faced the same error and ended up with [building from source](https://datafusion.apache.org/come

Re: [I] Add H2O.ai Database-like Ops benchmark to `dfbench` [datafusion]

2025-01-02 Thread via GitHub
zhuqi-lucas commented on issue #7209: URL: https://github.com/apache/datafusion/issues/7209#issuecomment-2568768340 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] sql result discrepency with sqlite, postgres and duckdb bug #2 [datafusion]

2025-01-02 Thread via GitHub
aweltsch commented on issue #13782: URL: https://github.com/apache/datafusion/issues/13782#issuecomment-2568812549 Out of interest I looked into this a bit and wanted to share my findings. To me it looks like this might be related to the type coercion of `NULLIF`. I reduced the example

Re: [PR] WIP: Proposed interface for physical plan invariant checking. [datafusion]

2025-01-02 Thread via GitHub
alamb commented on code in PR #13986: URL: https://github.com/apache/datafusion/pull/13986#discussion_r1901304742 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -110,6 +110,16 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { /// trait, which is implem

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-02 Thread via GitHub
comphead commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1901305952 ## native/core/Cargo.toml: ## @@ -52,6 +52,9 @@ serde = { version = "1", features = ["derive"] } lazy_static = "1.4.0" prost = "0.12.1" jni = "0.21" +snap

Re: [PR] WIP: Proposed interface for physical plan invariant checking. [datafusion]

2025-01-02 Thread via GitHub
alamb commented on code in PR #13986: URL: https://github.com/apache/datafusion/pull/13986#discussion_r1901305754 ## datafusion/core/src/physical_planner.rs: ## @@ -2006,6 +2001,45 @@ fn tuple_err(value: (Result, Result)) -> Result<(T, R)> { } } +#[derive(Default)] +str

[PR] chore: update to DF.44 [datafusion-ballista]

2025-01-02 Thread via GitHub
milenkovicm opened a new pull request, #1153: URL: https://github.com/apache/datafusion-ballista/pull/1153 # Which issue does this PR close? Closes none. # Rationale for this change Keeping up with DF release cycle # What changes are included in this PR? -

Re: [I] Update ballista logo [datafusion-ballista]

2025-01-02 Thread via GitHub
milenkovicm commented on issue #1133: URL: https://github.com/apache/datafusion-ballista/issues/1133#issuecomment-2568302510 Should we conclude voting here? if i count correctly new no.3 has a bit more votes ? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Change trigger, rename `hash_collision.yml` to `extended.yml` and add comments [datafusion]

2025-01-02 Thread via GitHub
alamb commented on PR #13988: URL: https://github.com/apache/datafusion/pull/13988#issuecomment-2568328176 I am now convinced this works the way I expect -- the test is reported on the main commit -- see https://github.com/alamb/datafusion/pull/25 -- This is an automated message from the

Re: [PR] Optimize CASE expression for "expr or expr" usage. [datafusion]

2025-01-02 Thread via GitHub
aweltsch commented on PR #13953: URL: https://github.com/apache/datafusion/pull/13953#issuecomment-2568305530 I added a follow-up issue #13990 I hope it is worded clearly and accurately reflects the changes desired. @2010YOUY01 feel free to chime in. -- This is an automated message fr

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-02 Thread via GitHub
mbutrovich commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1901249341 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1570,13 +1588,41 @@ pub fn write_ipc_compressed( // write ipc_length placeholder ou

Re: [PR] [do-not-merge] Diff updated comet-parquet-exec feature branch against main [datafusion-comet]

2025-01-02 Thread via GitHub
mbutrovich closed pull request #1182: [do-not-merge] Diff updated comet-parquet-exec feature branch against main URL: https://github.com/apache/datafusion-comet/pull/1182 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[PR] feat: Reenable tests for filtered SMJ anti join [datafusion-comet]

2025-01-02 Thread via GitHub
comphead opened a new pull request, #1211: URL: https://github.com/apache/datafusion-comet/pull/1211 ## Which issue does this PR close? Checking that filtered SMJ antijoin works correctly in Comet after fixes in the DataFusion Closes #398 #861 #891. ## Rationale

[PR] Add support for USE SECONDARY ROLE (vs. ROLES) [datafusion-sqlparser-rs]

2025-01-02 Thread via GitHub
yoavcloud opened a new pull request, #1637: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1637 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: support `RightAnti` for `SortMergeJoin` [datafusion]

2025-01-02 Thread via GitHub
irenjj commented on code in PR #13680: URL: https://github.com/apache/datafusion/pull/13680#discussion_r1901465804 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -2910,6 +2992,310 @@ mod tests { Ok(()) } +#[tokio::test] +async fn join_rig

Re: [I] Functionality of `array_repeat` udf [datafusion]

2025-01-02 Thread via GitHub
jatin510 commented on issue #13872: URL: https://github.com/apache/datafusion/issues/13872#issuecomment-2567574580 > I think we can return null for this case should we make this change ? @jayzhan211 @alamb -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Optimize CASE expression for "expr or expr" usage. [datafusion]

2025-01-02 Thread via GitHub
aweltsch commented on code in PR #13953: URL: https://github.com/apache/datafusion/pull/13953#discussion_r1900836680 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -394,6 +401,43 @@ impl CaseExpr { Ok(ColumnarValue::Array(zip(&when_value, &then_value, &els

Re: [PR] Optimize CASE expression for "expr or expr" usage. [datafusion]

2025-01-02 Thread via GitHub
aweltsch commented on code in PR #13953: URL: https://github.com/apache/datafusion/pull/13953#discussion_r1900836680 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -394,6 +401,43 @@ impl CaseExpr { Ok(ColumnarValue::Array(zip(&when_value, &then_value, &els

[PR] Include license and notice files in more crates [datafusion]

2025-01-02 Thread via GitHub
ankane opened a new pull request, #13985: URL: https://github.com/apache/datafusion/pull/13985 ## Which issue does this PR close? None ## Rationale for this change Applies #13512 to more (remaining) crates. ## What changes are included in this PR? For the `d

Re: [PR] feat: rand expression support [datafusion-comet]

2025-01-02 Thread via GitHub
mbutrovich commented on PR #1199: URL: https://github.com/apache/datafusion-comet/pull/1199#issuecomment-2567981055 Are the partition related changes necessary for this PR? Otherwise, it might be better to reduce the scope to just the `rand()` expression. -- This is an automated message

Re: [I] FFI Execution Plans that spawn threads panic [datafusion]

2025-01-02 Thread via GitHub
kevinjqliu commented on issue #13851: URL: https://github.com/apache/datafusion/issues/13851#issuecomment-2568390951 Thank you! Feel free to ping me if you need any help setting up the env for iceberg-rust. -- This is an automated message from the Apache Git Service. To respond to the mes

[I] Simplify error handling in case.rs [datafusion]

2025-01-02 Thread via GitHub
aweltsch opened a new issue, #13990: URL: https://github.com/apache/datafusion/issues/13990 ### Is your feature request related to a problem or challenge? As part of the review for #13953 @2010YOUY01 brought up that some of the error handling code in `datafusion/physical-expr/src/expr

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1901265247 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1650,18 +1719,44 @@ mod test { #[test] #[cfg_attr(miri, ignore)] // miri can't cal

Re: [PR] doc-gen: migrate scalar functions (string) documentation 2/4 [datafusion]

2025-01-02 Thread via GitHub
comphead commented on PR #13925: URL: https://github.com/apache/datafusion/pull/13925#issuecomment-2568282373 @Chen-Yuan-Lai please take latest updates from your branch and run `./dev/update_function_docs.sh` I think this way we fixing all the mismatches -- This is an automated message fr

Re: [PR] WIP: Proposed interface for physical plan invariant checking. [datafusion]

2025-01-02 Thread via GitHub
ozankabak commented on code in PR #13986: URL: https://github.com/apache/datafusion/pull/13986#discussion_r1901204911 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -110,6 +110,16 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { /// trait, which is im

Re: [PR] WIP: Proposed interface for physical plan invariant checking. [datafusion]

2025-01-02 Thread via GitHub
ozankabak commented on code in PR #13986: URL: https://github.com/apache/datafusion/pull/13986#discussion_r1901204911 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -110,6 +110,16 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { /// trait, which is im

Re: [PR] feat: add support for array_contains expression [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove merged PR #1163: URL: https://github.com/apache/datafusion-comet/pull/1163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: Add safety check to CometBuffer [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on PR #1050: URL: https://github.com/apache/datafusion-comet/pull/1050#issuecomment-2568283985 I will start reviewing this PR today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] feat: add support for array_remove expression [datafusion-comet]

2025-01-02 Thread via GitHub
dharanad commented on code in PR #1179: URL: https://github.com/apache/datafusion-comet/pull/1179#discussion_r1901263745 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2517,4 +2517,16 @@ class CometExpressionSuite extends CometTestBase with Adaptive

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-02 Thread via GitHub
comphead commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1901317525 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1567,17 +1585,41 @@ pub fn write_ipc_compressed( let mut timer = ipc_time.timer(); le

Re: [PR] Custom scalar to sql overrides support for DuckDB Unparser dialect [datafusion]

2025-01-02 Thread via GitHub
sgrebnov commented on PR #13915: URL: https://github.com/apache/datafusion/pull/13915#issuecomment-2568661287 @goldmedal - thank you for the feedback. Yeah, other dialects will benefit from similar logic as well. I was exploring how to make this work for all dialects, but it is not possibl

Re: [PR] feat: add support for array_contains expression [datafusion-comet]

2025-01-02 Thread via GitHub
dharanad commented on code in PR #1163: URL: https://github.com/apache/datafusion-comet/pull/1163#discussion_r1900779871 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2390,4 +2390,14 @@ class CometExpressionSuite extends CometTestBase with Adaptive

[PR] Minor: sort requirement check for `Last` function's `merge_batch` [datafusion]

2025-01-02 Thread via GitHub
jayzhan211 opened a new pull request, #13980: URL: https://github.com/apache/datafusion/pull/13980 ## Which issue does this PR close? Closes #. ## Rationale for this change We also check `requirement_satisfied` for `update_batch`, add it for `merge_batch`.

Re: [PR] feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on PR #1021: URL: https://github.com/apache/datafusion-comet/pull/1021#issuecomment-2567894661 I also had to remove the off-heap check here: ```scala private[comet] def isCometShuffleEnabled(conf: SQLConf): Boolean = COMET_EXEC_SHUFFLE_ENABLED.get(conf)

Re: [PR] feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on PR #1021: URL: https://github.com/apache/datafusion-comet/pull/1021#issuecomment-2567903221 I also had to make change in `CometShuffleMemoryAllocatorTrait` and I am now able to test. I will post results later today. ``` -if (isSparkTesting && !useUnified

Re: [PR] feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on code in PR #1021: URL: https://github.com/apache/datafusion-comet/pull/1021#discussion_r1900936539 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -467,6 +467,15 @@ object CometConf extends ShimCometConf { .booleanConf .creat

Re: [PR] feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on code in PR #1021: URL: https://github.com/apache/datafusion-comet/pull/1021#discussion_r1900936196 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -467,6 +467,15 @@ object CometConf extends ShimCometConf { .booleanConf .creat

Re: [PR] feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. [datafusion-comet]

2025-01-02 Thread via GitHub
andygrove commented on PR #1021: URL: https://github.com/apache/datafusion-comet/pull/1021#issuecomment-2567889786 I had to remove the following code from `CometSparkSessionExtensions` to test with off-heap disabled. We can remove this in a follow up PR as well. ```scala if

Re: [PR] Include license and notice files in more crates [datafusion]

2025-01-02 Thread via GitHub
alamb merged PR #13985: URL: https://github.com/apache/datafusion/pull/13985 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] docs: Add datafusion python 43.1.0 blog post to events page [datafusion]

2025-01-02 Thread via GitHub
alamb merged PR #13974: URL: https://github.com/apache/datafusion/pull/13974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

  1   2   >