Re: [PR] Fix MySQL parsing of GRANT, REVOKE, and CREATE VIEW [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
yoavcloud commented on code in PR #1538: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1538#discussion_r1908272974 ## src/parser/mod.rs: ## @@ -11808,23 +11899,32 @@ impl<'a> Parser<'a> { } } +pub fn parse_grantee_name(&mut self) -> Result { +

Re: [PR] show a mismatch for initcap between Spark and DataFusion [datafusion-comet]

2025-01-08 Thread via GitHub
kazuyukitanimura commented on PR #1051: URL: https://github.com/apache/datafusion-comet/pull/1051#issuecomment-2579336663 #1052 should be already fixed with the DF44 release. Would you like to rebase and re-trigger this test? @Blizzara -- This is an automated message from the Apache Git

Re: [I] Initcap behaves differently in Spark and in DataFusion (also Comet) [datafusion-comet]

2025-01-08 Thread via GitHub
kazuyukitanimura commented on issue #1052: URL: https://github.com/apache/datafusion-comet/issues/1052#issuecomment-2579333075 The fix should be already included in https://github.com/apache/datafusion/commits/branch-44/ -- This is an automated message from the Apache Git Service. To res

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-08 Thread via GitHub
kosiew commented on PR #981: URL: https://github.com/apache/datafusion-python/pull/981#issuecomment-2579331381 Does anyone know how to fix this error: ``` ruff check --output-format=github python/ ruff format --check python/ shell: /usr/bin/bash -e {0} env: py

Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-01-08 Thread via GitHub
berkaysynnada commented on issue #14036: URL: https://github.com/apache/datafusion/issues/14036#issuecomment-2579313671 We have also checkpoint tests which will drop the stream after some amount of time, and after the failure, FileStream offsets do not increment more. I think the same

Re: [PR] 'array_repeat' if the repeat count value is 0, return NULL instead of empty array [datafusion]

2025-01-08 Thread via GitHub
jonahgao commented on PR #14046: URL: https://github.com/apache/datafusion/pull/14046#issuecomment-2579269061 > Why haven’t we been displaying `null` values as `NULL` so far? What was the original reasoning or intention behind this decision? I guess it's to follow PostgreSQL CLI. Post

Re: [PR] 'array_repeat' if the repeat count value is 0, return NULL instead of empty array [datafusion]

2025-01-08 Thread via GitHub
jatin510 commented on PR #14046: URL: https://github.com/apache/datafusion/pull/14046#issuecomment-2579141610 Why haven’t we been displaying null values as NULL so far? What was the original reasoning or intention behind this decision? -- This is an automated message from the Apache Gi

Re: [PR] 'array_repeat' if the repeat count value is 0, return NULL instead of empty array [datafusion]

2025-01-08 Thread via GitHub
jatin510 commented on PR #14046: URL: https://github.com/apache/datafusion/pull/14046#issuecomment-2579130276 > I think we should fix it on the display/formatting side. For example, we still cannot distinguish: > > ``` > DataFusion CLI v44.0.0 > > > select array[], array[nul

Re: [PR] 'array_repeat' if the repeat count value is 0, return NULL instead of empty array [datafusion]

2025-01-08 Thread via GitHub
jayzhan211 commented on PR #14046: URL: https://github.com/apache/datafusion/pull/14046#issuecomment-2579134148 > > I think we should fix it on the display/formatting side. For example, we still cannot distinguish: > > ``` > > DataFusion CLI v44.0.0 > > > > > select array[], arr

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (groupby support) [datafusion]

2025-01-08 Thread via GitHub
zhuqi-lucas commented on PR #13996: URL: https://github.com/apache/datafusion/pull/13996#issuecomment-2579118558 > Thank you @zhuqi-lucas and @2010YOUY01 > > I tried this out locally and it worked really nicely. Thank you > > I think the following follow on tasks would be valuab

Re: [I] Doc attribution: make `user_doc` to work with predefined consts. [datafusion]

2025-01-08 Thread via GitHub
Chen-Yuan-Lai commented on issue #14001: URL: https://github.com/apache/datafusion/issues/14001#issuecomment-2579088412 @comphead if the doc_section is a string, user_doc macro syntax will change for all functions? Before ```rust #[user_doc( doc_section(label = "String F

Re: [PR] feat: support `INSERT INTO [TABLE] FUNCTION` of Clickhouse [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
byte-sourcerer commented on PR #1633: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1633#issuecomment-2579081807 Hi @iffyio, I have revised the PR. Could you please review it again? -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] feat: support `INSERT INTO [TABLE] FUNCTION` of Clickhouse [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
byte-sourcerer commented on code in PR #1633: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1633#discussion_r1908100861 ## src/ast/dml.rs: ## @@ -470,8 +470,7 @@ pub struct Insert { /// INTO - optional keyword pub into: bool, /// TABLE -#[cfg_at

Re: [PR] 'array_repeat' if the repeat count value is 0, return NULL instead of empty array [datafusion]

2025-01-08 Thread via GitHub
jonahgao commented on PR #14046: URL: https://github.com/apache/datafusion/pull/14046#issuecomment-2579073819 I think we should fix it on the display/formatting side. For example, we still cannot distinguish: ``` DataFusion CLI v44.0.0 > select array[], array[null]; +

Re: [PR] Minor: Document output schema of LogicalPlan::Aggregate and LogicalPl… [datafusion]

2025-01-08 Thread via GitHub
jonahgao merged PR #14047: URL: https://github.com/apache/datafusion/pull/14047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Functionality of `array_repeat` udf [datafusion]

2025-01-08 Thread via GitHub
jonahgao commented on issue #13872: URL: https://github.com/apache/datafusion/issues/13872#issuecomment-2579050717 It seems to be a display/formatting issue for me. We should always display NULLs within containers, like Postgres and DuckDB do. ```psql psql=> select ARRAY[null]; arr

Re: [PR] Clickhouse SQL generation for datatypes. [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
github-actions[bot] closed pull request #1482: Clickhouse SQL generation for datatypes. URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat(substrait): add support for insert roundtrip in append mode [datafusion]

2025-01-08 Thread via GitHub
github-actions[bot] closed pull request #13118: feat(substrait): add support for insert roundtrip in append mode URL: https://github.com/apache/datafusion/pull/13118 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-08 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1908071361 ## python/datafusion/dataframe.py: ## @@ -35,6 +35,65 @@ from datafusion._internal import DataFrame as DataFrameInternal from datafusion.expr import Expr, So

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-08 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1908071116 ## python/datafusion/dataframe.py: ## @@ -620,17 +679,34 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-08 Thread via GitHub
parthchandra commented on PR #1229: URL: https://github.com/apache/datafusion-comet/pull/1229#issuecomment-2579022385 Can you confirm if this is related to columnar shuffle by disabling it? This looks to me like an NaN normalizing issue (clearly the Rust NaNs are being counted as not equa

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-08 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1908068498 ## python/datafusion/dataframe.py: ## @@ -620,17 +679,34 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [I] Potential bug in columnar shuffle [datafusion-comet]

2025-01-08 Thread via GitHub
kazuyukitanimura commented on issue #1238: URL: https://github.com/apache/datafusion-comet/issues/1238#issuecomment-2579015443 Hmm can be an aggregation bug? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] test: Enable shuffle by default in Spark tests [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on code in PR #1240: URL: https://github.com/apache/datafusion-comet/pull/1240#discussion_r1908064532 ## dev/diffs/3.4.3.diff: ## @@ -2880,23 +2874,22 @@ index ed2e309fa07..59adc094970 100644 +conf + .set("spark.comet.exec.enabled", "true")

[PR] test: Enable shuffle by default in Spark tests [datafusion-comet]

2025-01-08 Thread via GitHub
kazuyukitanimura opened a new pull request, #1240: URL: https://github.com/apache/datafusion-comet/pull/1240 ## Which issue does this PR close? ## Rationale for this change Because `isCometShuffleEnabled` is false by default, some tests were not reached ## What changes a

Re: [I] java.lang.ClassNotFoundException: org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager [datafusion-comet]

2025-01-08 Thread via GitHub
ramyadass commented on issue #864: URL: https://github.com/apache/datafusion-comet/issues/864#issuecomment-2579003972 @nblagodarnyi , I've tried to use Comet with Spark using two different commands, but I'm encountering the same error in both cases. 1. Using local jar:

Re: [I] Doc attribution: make `user_doc` to work with predefined consts. [datafusion]

2025-01-08 Thread via GitHub
ding-young commented on issue #14001: URL: https://github.com/apache/datafusion/issues/14001#issuecomment-2578996923 @comphead Yes, I've taken a brief look at `scalar_doc_sections::DOC_SECTION_STRING` and the `user_doc` macro, and I plan to make the necessary code changes this weekend. -

Re: [PR] [substrait] Add support for ExtensionTable [datafusion]

2025-01-08 Thread via GitHub
vbarua commented on PR #13772: URL: https://github.com/apache/datafusion/pull/13772#issuecomment-2578996616 > With the latest changes, supporting extension tables no longer requires a fork of datafusion, but merely a custom implementation for the new traits. Nice! > However, th

Re: [PR] Custom scalar to sql overrides support for DuckDB Unparser dialect [datafusion]

2025-01-08 Thread via GitHub
goldmedal commented on PR #13915: URL: https://github.com/apache/datafusion/pull/13915#issuecomment-2578995368 Thanks @sgrebnov. Look great 👍 > Note: please let me know if you prefer me to add support for all dialects as part of this PR - I'll be able to do this: add impl + tests.

Re: [PR] Unparsing optimized (> 2 inputs) unions [datafusion]

2025-01-08 Thread via GitHub
goldmedal commented on PR #14031: URL: https://github.com/apache/datafusion/pull/14031#issuecomment-2578985631 Thanks @MohamedAbdeen21 and @phillipleblanc for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Unparsing optimized (> 2 inputs) unions [datafusion]

2025-01-08 Thread via GitHub
goldmedal merged PR #14031: URL: https://github.com/apache/datafusion/pull/14031 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Unparse `UNION` plan with multiple inputs to SQL text [datafusion]

2025-01-08 Thread via GitHub
goldmedal closed issue #13621: Unparse `UNION` plan with multiple inputs to SQL text URL: https://github.com/apache/datafusion/issues/13621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] [EPIC] A collection of items to improve developer / CI speed [datafusion]

2025-01-08 Thread via GitHub
Omega359 commented on issue #13813: URL: https://github.com/apache/datafusion/issues/13813#issuecomment-2578956224 FYI, I came across [this today](https://github.com/rust-lang/rust/pull/126245) with a great example of how much faster doctests can be in [this comment](https://github.com/rus

Re: [PR] chore: Improve shuffle configuration [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on code in PR #1207: URL: https://github.com/apache/datafusion-comet/pull/1207#discussion_r1908011437 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -242,17 +241,17 @@ object CometConf extends ShimCometConf { .booleanConf .crea

Re: [PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-08 Thread via GitHub
parthchandra commented on PR #1229: URL: https://github.com/apache/datafusion-comet/pull/1229#issuecomment-2578930234 > > > > > Finally, can we include two more things (either in spark_parquet_options or in some parquet_conversion_context struct) which has the conversion and type promition

Re: [PR] chore: Improve shuffle configuration [datafusion-comet]

2025-01-08 Thread via GitHub
parthchandra commented on code in PR #1207: URL: https://github.com/apache/datafusion-comet/pull/1207#discussion_r1908004404 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -242,17 +241,17 @@ object CometConf extends ShimCometConf { .booleanConf .c

[PR] ignore: just testing something [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove opened a new pull request, #1239: URL: https://github.com/apache/datafusion-comet/pull/1239 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] Potential bug in columnar shuffle [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on issue #1238: URL: https://github.com/apache/datafusion-comet/issues/1238#issuecomment-2578892598 I can reproduce the issue in `main` so I am confused how this is currently passing when we run Spark SQL tests. Any idea @kazuyukitanimura or @parthchandra? Here i

Re: [I] Move CPU Bound Tasks off Tokio Threadpool [datafusion]

2025-01-08 Thread via GitHub
stuhood commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2578874963 (take with a grain of salt because I haven't worked with `block_in_place` inside of hot loops) > The performance implications definitely concern me, I have a nagging susp

Re: [PR] Fix MySQL parsing of GRANT, REVOKE, and CREATE VIEW [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
mvzink commented on code in PR #1538: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1538#discussion_r1907967357 ## src/parser/mod.rs: ## @@ -11808,23 +11899,32 @@ impl<'a> Parser<'a> { } } +pub fn parse_grantee_name(&mut self) -> Result { +

Re: [PR] Fix MySQL parsing of GRANT, REVOKE, and CREATE VIEW [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
mvzink commented on PR #1538: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1538#issuecomment-2578858394 @iffyio rebased, but there were some significant changes to grantee parsing by @yoavcloud so maybe he should take a look too. -- This is an automated message from the Ap

Re: [PR] [substrait] Add support for ExtensionTable [datafusion]

2025-01-08 Thread via GitHub
ccciudatu commented on PR #13772: URL: https://github.com/apache/datafusion/pull/13772#issuecomment-2578841705 @vbarua @Blizzara I finally got back to figure out whether it still makes sense with the new APIs. With the latest changes, supporting extension tables no longer requires a fork

Re: [PR] perf: Improve query planning to more reliably fall back to columnar shuffle when native shuffle is not supported [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on PR #1209: URL: https://github.com/apache/datafusion-comet/pull/1209#issuecomment-2578822397 Summary of Spark SQL test failures: | Test | Failure | |-|-| | reverse preceding/following range between with aggregation | CometNativeException: Invalid argument

[I] Potential bug in columnar shuffle [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove opened a new issue, #1238: URL: https://github.com/apache/datafusion-comet/issues/1238 ### Describe the bug In https://github.com/apache/datafusion-comet/pull/1209 we now fall back to columnar shuffle in some cases where native shuffle is not supported, rather than just fal

[I] Optimize filtered SortMergeJoin to avoid producing small/empty batches [datafusion]

2025-01-08 Thread via GitHub
comphead opened a new issue, #14050: URL: https://github.com/apache/datafusion/issues/14050 ### Is your feature request related to a problem or challenge? Related to #9846 In #9846 there is a couple of tasks to fix the correctness issues for SortMergeJoin with filter clause

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-08 Thread via GitHub
timsaucer commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1907925883 ## python/datafusion/dataframe.py: ## @@ -620,17 +679,34 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_p

Re: [PR] Improve perfomance of `reverse` function [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #14025: URL: https://github.com/apache/datafusion/pull/14025#issuecomment-2578781820 Nice work @tlm365 @2010YOUY01 and @simonvandel ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] feat: Implement custom RecordBatch serde for shuffle for improved performance [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on code in PR #1190: URL: https://github.com/apache/datafusion-comet/pull/1190#discussion_r1907927684 ## native/core/src/execution/shuffle/codec.rs: ## @@ -0,0 +1,675 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] feat: Implement custom RecordBatch serde for shuffle for improved performance [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on code in PR #1190: URL: https://github.com/apache/datafusion-comet/pull/1190#discussion_r1907927851 ## native/core/benches/shuffle_writer.rs: ## @@ -31,67 +31,52 @@ use std::sync::Arc; use tokio::runtime::Runtime; fn criterion_benchmark(c: &mut Criterio

Re: [I] Doc attribution: make `user_doc` to work with predefined consts. [datafusion]

2025-01-08 Thread via GitHub
comphead commented on issue #14001: URL: https://github.com/apache/datafusion/issues/14001#issuecomment-2578777419 Hi @ding-young are you still planning to work on this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] doc-gen: migrate scalar functions (encoding & regex) documentation [datafusion]

2025-01-08 Thread via GitHub
comphead commented on PR #13919: URL: https://github.com/apache/datafusion/pull/13919#issuecomment-2578775617 Its not ready yet, it can be fixed by #14001 (preferrable) or alternatively I can do manual correction. I'm sending this to draft for now -- This is an automated message from the

Re: [PR] Support async iteration of RecordBatchStream [datafusion-python]

2025-01-08 Thread via GitHub
timsaucer commented on code in PR #975: URL: https://github.com/apache/datafusion-python/pull/975#discussion_r1907919652 ## python/datafusion/record_batch.py: ## @@ -59,18 +59,22 @@ def __init__(self, record_batch_stream: df_internal.RecordBatchStream) -> None: def next(

Re: [PR] doc-gen: migrate scalar functions (encoding & regex) documentation [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #13919: URL: https://github.com/apache/datafusion/pull/13919#issuecomment-2578752052 Is this PR ready to go? Or are we waiting for something else to finisih it up? -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[I] Add pytest-asyncio unit tests [datafusion-python]

2025-01-08 Thread via GitHub
timsaucer opened a new issue, #991: URL: https://github.com/apache/datafusion-python/issues/991 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** This is a follow on to https://github.com/apache/datafusion-python/pull/975 where

Re: [PR] chore: extract json_funcs expressions to folders based on spark grouping [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove merged PR #1220: URL: https://github.com/apache/datafusion-comet/pull/1220 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Should ScanExec use Spark-compatible cast instead of DataFusion cast? [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on issue #803: URL: https://github.com/apache/datafusion-comet/issues/803#issuecomment-2578748469 I agree that SchemaAdapter is now the solution. I will close this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Should ScanExec use Spark-compatible cast instead of DataFusion cast? [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove closed issue #803: Should ScanExec use Spark-compatible cast instead of DataFusion cast? URL: https://github.com/apache/datafusion-comet/issues/803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [I] Missing "INFO" log level [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on issue #1127: URL: https://github.com/apache/datafusion-comet/issues/1127#issuecomment-2578746640 I could reproduce the issue when building from the `0.4.0` tag, but I do see the logging when I enable off-heap memory so the issue appears to be that we were silently di

Re: [I] Missing "INFO" log level [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove closed issue #1127: Missing "INFO" log level URL: https://github.com/apache/datafusion-comet/issues/1127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] feat: Implement custom RecordBatch serde for shuffle for improved performance [datafusion-comet]

2025-01-08 Thread via GitHub
kazuyukitanimura commented on code in PR #1190: URL: https://github.com/apache/datafusion-comet/pull/1190#discussion_r1907912677 ## native/core/src/execution/shuffle/codec.rs: ## @@ -0,0 +1,675 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [I] Missing "INFO" log level [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on issue #1127: URL: https://github.com/apache/datafusion-comet/issues/1127#issuecomment-2578747319 I am going to go ahead and close this since we no longer require off-heap to be enabled. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] chore: Improve shuffle configuration [datafusion-comet]

2025-01-08 Thread via GitHub
kazuyukitanimura commented on code in PR #1207: URL: https://github.com/apache/datafusion-comet/pull/1207#discussion_r1907901646 ## docs/source/user-guide/tuning.md: ## @@ -78,43 +78,47 @@ It must be set before the Spark context is created. You can enable or disable Co at runt

Re: [PR] [comet-parquet-exec] fix: Set scan implementation choice via environment variable [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove merged PR #1231: URL: https://github.com/apache/datafusion-comet/pull/1231 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Jan 1, 2025: This week(s) in DataFusion [datafusion]

2025-01-08 Thread via GitHub
alamb commented on issue #13970: URL: https://github.com/apache/datafusion/issues/13970#issuecomment-2578738711 2025: The year of 1000 systems built on Datafusion: https://www.influxdata.com/blog/datafusion-2025-influxdb/ -- This is an automated message from the Apache Git Service. To res

Re: [I] Missing "INFO" log level [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on issue #1127: URL: https://github.com/apache/datafusion-comet/issues/1127#issuecomment-2578730448 I do not see the issue when I build the latest from the main branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] feat: support enable_url_table config [datafusion-python]

2025-01-08 Thread via GitHub
timsaucer merged PR #980: URL: https://github.com/apache/datafusion-python/pull/980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] fix: Set scan implementation choice via environment variable [datafusion-comet]

2025-01-08 Thread via GitHub
parthchandra commented on PR #1231: URL: https://github.com/apache/datafusion-comet/pull/1231#issuecomment-2578718804 @andygrove rebased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] feat: support enable_url_table config [datafusion-python]

2025-01-08 Thread via GitHub
timsaucer commented on code in PR #980: URL: https://github.com/apache/datafusion-python/pull/980#discussion_r1907894122 ## python/datafusion/context.py: ## @@ -472,6 +472,18 @@ def __init__( self.ctx = SessionContextInternal(config, runtime) +def enable_url_tab

Re: [I] Exponential planning time (100s of seconds) with `UNION` and `ORDER BY` queries [datafusion]

2025-01-08 Thread via GitHub
alamb commented on issue #13748: URL: https://github.com/apache/datafusion/issues/13748#issuecomment-2578712143 Update: I spent some time trying to avoid calling `OrderingEquivalenceClass::normalized_oeq_class` as much. This helped but not enough to really fix the problem: - https://

[PR] WIP: Reduce time spent normalizing [datafusion]

2025-01-08 Thread via GitHub
alamb opened a new pull request, #14049: URL: https://github.com/apache/datafusion/pull/14049 Still a WIP ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/13748 ## Rationale for this change The continued re-normalization of eq

Re: [PR] Support pluralized time units [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
alamb commented on PR #1630: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1630#issuecomment-2578675517 Wow we have really been picking up steam on reviews in sqlparser. The next relaese is going to be sweet -- This is an automated message from the Apache Git Service. To re

Re: [I] Missing "INFO" log level [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on issue #1127: URL: https://github.com/apache/datafusion-comet/issues/1127#issuecomment-2578652346 Here is my repro. ## Comet 0.3.0 ``` scala> spark.read.parquet("/mnt/bigdata/tpch/sf100/lineitem.parquet").createTempView("lineitem") 25/01/08 13:56:35

[PR] ALTER TABLE DROP {COLUMN|CONSTRAINT} xxx RESTRICT [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
stepancheg opened a new pull request, #1651: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1651 https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#_11_23_drop_column_definition https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#_1

Re: [PR] [comet-parquet-exec] Fix regressions in DisableAQECometShuffleSuite [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove merged PR #1237: URL: https://github.com/apache/datafusion-comet/pull/1237 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: Set scan implementation choice via environment variable [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on PR #1231: URL: https://github.com/apache/datafusion-comet/pull/1231#issuecomment-2578538520 Thanks @parthchandra. LGTM but now needs a rebase. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-08 Thread via GitHub
comphead commented on PR #14020: URL: https://github.com/apache/datafusion/pull/14020#issuecomment-2578524584 Oh with literal support I think the code becomes much more tricky. Wondering if the performance benefit still worthy such complications. @tlm365 can we a criterion to check `find_

Re: [PR] [comet-parquet-exec] Fix regressions in DisableAQECometShuffleSuite [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on code in PR #1237: URL: https://github.com/apache/datafusion-comet/pull/1237#discussion_r1907735274 ## spark/src/main/scala/org/apache/spark/sql/comet/CometNativeScanExec.scala: ## @@ -120,4 +120,11 @@ object CometNativeScanExec extends DataTypeSupport {

Re: [PR] [comet-parquet-exec] Fix regressions in DisableAQECometShuffleSuite [datafusion-comet]

2025-01-08 Thread via GitHub
parthchandra commented on code in PR #1237: URL: https://github.com/apache/datafusion-comet/pull/1237#discussion_r1907723702 ## spark/src/main/scala/org/apache/spark/sql/comet/CometNativeScanExec.scala: ## @@ -120,4 +120,11 @@ object CometNativeScanExec extends DataTypeSupport {

Re: [I] Inference of ListingTableConfig does not work (anymore) for compressed json file [datafusion]

2025-01-08 Thread via GitHub
alamb closed issue #14016: Inference of ListingTableConfig does not work (anymore) for compressed json file URL: https://github.com/apache/datafusion/issues/14016 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[PR] [comet-parquet-exec] Fix regression disable aqe [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove opened a new pull request, #1237: URL: https://github.com/apache/datafusion-comet/pull/1237 ## Which issue does this PR close? N/A ## Rationale for this change Fix failing tests in DisableAQECometShuffleSuite ## What changes are included i

Re: [PR] feat(optimizer): Enable filter pushdown on window functions [datafusion]

2025-01-08 Thread via GitHub
comphead merged PR #14026: URL: https://github.com/apache/datafusion/pull/14026 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Fix: ensure that compression type is also taken into consideration during ListingTableConfig infer_options [datafusion]

2025-01-08 Thread via GitHub
alamb merged PR #14021: URL: https://github.com/apache/datafusion/pull/14021 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix: ensure that compression type is also taken into consideration during ListingTableConfig infer_options [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #14021: URL: https://github.com/apache/datafusion/pull/14021#issuecomment-2578474985 Thanks again @timvw -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[PR] docs(ci): use up-to-date protoc with docs.rs [datafusion]

2025-01-08 Thread via GitHub
wackywendell opened a new pull request, #14048: URL: https://github.com/apache/datafusion/pull/14048 ## Which issue does this PR close? Closes #13853. ## Rationale for this change This uses the same basic solution as in `substrait-rs`: https://github.com/substrait

Re: [PR] Fix: ensure that compression type is also taken into consideration during ListingTableConfig infer_options [datafusion]

2025-01-08 Thread via GitHub
timvw commented on PR #14021: URL: https://github.com/apache/datafusion/pull/14021#issuecomment-2578428357 > > 🤔 something seems to be broken now > > I feel bad that this broke after my suggestion -- here is a proposal targeting this branch to fix it: > > * [Fix inferring logic

Re: [PR] Fix: ensure that compression type is also taken into consideration during ListingTableConfig infer_options [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #14021: URL: https://github.com/apache/datafusion/pull/14021#issuecomment-2578417153 > 🤔 something seems to be broken now I feel bad that this broke after my suggestion -- here is a proposal targeting this branch to fix it: - https://github.com/timvw/datafusio

Re: [PR] Simplify error handling in case.rs (#13990) [datafusion]

2025-01-08 Thread via GitHub
alamb commented on code in PR #14033: URL: https://github.com/apache/datafusion/pull/14033#discussion_r1907660263 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -369,11 +366,8 @@ impl CaseExpr { // evaluate when expression let when_value = self.when

Re: [PR] Encapsulate fields of `OrderingEquivalenceClass` (make field non pub) [datafusion]

2025-01-08 Thread via GitHub
alamb merged PR #14037: URL: https://github.com/apache/datafusion/pull/14037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Encapsulate fields of `OrderingEquivalenceClass` (make field non pub) [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #14037: URL: https://github.com/apache/datafusion/pull/14037#issuecomment-2578373024 Since I think this PR is unobjectionable I am merging it in -- I am happy to address any other comments as follow on PRs -- This is an automated message from the Apache Git Service.

Re: [PR] Add support for MS-SQL BEGIN/END TRY/CATCH [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
iffyio merged PR #1649: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Replace `ReferentialAction` enum in `DROP` statements [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
iffyio merged PR #1648: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1648 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] datafusion-substrait API docs on docs.rs are broken [datafusion]

2025-01-08 Thread via GitHub
alamb commented on issue #13853: URL: https://github.com/apache/datafusion/issues/13853#issuecomment-2578342385 > We could just add that directive to datafusion/substrait/Cargo.toml and see if it fixes it in the next version? Any other ideas? This sounds like a great idea to me -- tha

Re: [PR] feat(optimizer): Enable filter pushdown on window functions [datafusion]

2025-01-08 Thread via GitHub
nuno-faria commented on code in PR #14026: URL: https://github.com/apache/datafusion/pull/14026#discussion_r1907601841 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -1442,6 +1513,227 @@ mod tests { assert_optimized_plan_eq(plan, expected) } +/// veri

Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-01-08 Thread via GitHub
alamb commented on issue #14036: URL: https://github.com/apache/datafusion/issues/14036#issuecomment-2578322453 I tried a bit today to re-create this but was not able to What I tried was to create a highly compressed parquet file (48MB that has 1B rows with all repeated strings) and

Re: [PR] feat(optimizer): Enable filter pushdown on window functions [datafusion]

2025-01-08 Thread via GitHub
nuno-faria commented on code in PR #14026: URL: https://github.com/apache/datafusion/pull/14026#discussion_r1907597367 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -985,6 +985,77 @@ impl OptimizerRule for PushDownFilter { }

Re: [PR] feat: support enable_url_table config [datafusion-python]

2025-01-08 Thread via GitHub
timsaucer commented on PR #980: URL: https://github.com/apache/datafusion-python/pull/980#issuecomment-2578313618 Very nice. Thank you. I've kicked off CI and will merge if all goes through. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Parse Postgres's LOCK TABLE statement [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
iffyio commented on code in PR #1614: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1614#discussion_r1907585091 ## src/ast/mod.rs: ## @@ -7278,16 +7279,126 @@ impl fmt::Display for SearchModifier { } } +/// A `LOCK TABLE ..` statement. MySQL and Postgres v

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-08 Thread via GitHub
tlm365 commented on PR #14020: URL: https://github.com/apache/datafusion/pull/14020#issuecomment-2578291774 I just pushed some updates to support scalar args. Could you please take a look? @jayzhan211 @comphead -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] Implement xxhash algorithms as part of the expression API [datafusion]

2025-01-08 Thread via GitHub
HectorPascual commented on issue #14044: URL: https://github.com/apache/datafusion/issues/14044#issuecomment-2578288000 Hi, thanks for the reply. That is true, although, my concern comes from this other issue raised in delta-rs project (link below), since I need to use this hash opera

Re: [PR] Fix: ensure that compression type is also taken into consideration during ListingTableConfig infer_options [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #14021: URL: https://github.com/apache/datafusion/pull/14021#issuecomment-2578285930 🤔 something seems to be broken now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

  1   2   >