greedAuguria opened a new issue, #19650:
URL: https://github.com/apache/datafusion/issues/19650
### Describe the bug
When using Hive-style partitioned tables where partition values contain
URL-encoded characters (like `/` encoded as `%2F` or spaces as `%20`),
DataFusion returns the l
greedAuguria opened a new pull request, #19651:
URL: https://github.com/apache/datafusion/pull/19651
## Which issue does this PR close?
- Closes #19650.
## Rationale for this change
Currently, when DataFusion parses Hive-style partitioned paths (e.g.,
`s3://bucket/table/
alamb commented on PR #3054:
URL: https://github.com/apache/datafusion/pull/3054#issuecomment-3710473820
I don't know that there was any reason not to include `lpad` / `rpad` (I
don't think it was a deliberate choice)
--
This is an automated message from the Apache Git Service.
To respond
alamb commented on issue #19573:
URL: https://github.com/apache/datafusion/issues/19573#issuecomment-3710480077
> > I have a draft PR
[#19616](https://github.com/apache/datafusion/pull/19616) for one approach I
considered, which is to continue using a session level cache as is currently
milenkovicm merged PR #1363:
URL: https://github.com/apache/datafusion-ballista/pull/1363
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubsc
milenkovicm commented on PR #1363:
URL:
https://github.com/apache/datafusion-ballista/pull/1363#issuecomment-3710479860
thanks lads, will merge this one
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above t
alamb commented on code in PR #19616:
URL: https://github.com/apache/datafusion/pull/19616#discussion_r2661565794
##
datafusion/core/src/execution/context/mod.rs:
##
@@ -1327,12 +1329,34 @@ impl SessionContext {
&& table_provider.table_type() == table_type
ntjohnson1 commented on PR #19549:
URL: https://github.com/apache/datafusion/pull/19549#issuecomment-3710556356
> now we are changing/improving the behavior of `drop_columns`, we should
probably update the documentation (wherever the right place is?). I mean after
this PR `drop_columns` now
alamb commented on issue #19210:
URL: https://github.com/apache/datafusion/issues/19210#issuecomment-3710426223
Next one:
- https://github.com/apache/datafusion/issues/19652
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
alamb opened a new issue, #19652:
URL: https://github.com/apache/datafusion/issues/19652
This is my weekly plan, mostly for my own organizational need. I am making
it public in the hopes that helps others to see what I am working on -- also I
spend so much time in github the interface is v
alamb closed issue #19210: Andrew Lamb Weekly-ish Open Source plan - 2025-12-08
URL: https://github.com/apache/datafusion/issues/19210
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific com
timsaucer merged PR #1319:
URL: https://github.com/apache/datafusion-python/pull/1319
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...
timsaucer merged PR #1318:
URL: https://github.com/apache/datafusion-python/pull/1318
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...
timsaucer closed issue #1305: Full join on dataframe with only index yields
dropped rows
URL: https://github.com/apache/datafusion-python/issues/1305
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
timsaucer commented on PR #1318:
URL:
https://github.com/apache/datafusion-python/pull/1318#issuecomment-3710412009
I updated the description because I don't think this is a breaking change
since the `drop_duplicate_keys` wasn't released.
--
This is an automated message from the Apache G
timsaucer commented on PR #1320:
URL:
https://github.com/apache/datafusion-python/pull/1320#issuecomment-3710417582
Closing this PR since the consensus has landed on using the coalesce
approach instead. Thank you for the PR and helpful discussions!
--
This is an automated message from th
timsaucer closed pull request #1320: fix: Disallows dropping duplicate keys
when using full outer join
URL: https://github.com/apache/datafusion-python/pull/1320
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL ab
alamb commented on issue #19425:
URL: https://github.com/apache/datafusion/issues/19425#issuecomment-3710429984
Realistically I am not planning on working on this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
martin-g commented on code in PR #19570:
URL: https://github.com/apache/datafusion/pull/19570#discussion_r2661528785
##
datafusion/functions/src/string/split_part.rs:
##
@@ -219,22 +219,32 @@ where
.try_for_each(|((string, delimiter), n)| -> Result<(),
DataFusionError>
Brijesh-Thakkar commented on issue #3027:
URL:
https://github.com/apache/datafusion-comet/issues/3027#issuecomment-3711082419
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specifi
mattcuento commented on code in PR #1360:
URL:
https://github.com/apache/datafusion-ballista/pull/1360#discussion_r2662024965
##
Cargo.toml:
##
@@ -39,6 +39,7 @@ datafusion = "51.0.0"
datafusion-cli = "51.0.0"
datafusion-proto = "51.0.0"
datafusion-proto-common = "51.0.0"
+d
killzoner commented on issue #1258:
URL:
https://github.com/apache/datafusion-ballista/issues/1258#issuecomment-3711102969
Fresh from the stochastic parrot
https://github.com/apache/datafusion-ballista/pull/1364
--
This is an automated message from the Apache Git Service.
To respond to t
killzoner opened a new pull request, #1364:
URL: https://github.com/apache/datafusion-ballista/pull/1364
# Which issue does this PR close?
Closes https://github.com/apache/datafusion-ballista/issues/1258
# Rationale for this change
We don't want undocumented
AntoinePrv commented on issue #19654:
URL: https://github.com/apache/datafusion/issues/19654#issuecomment-3711428561
@alamb I'm taking the liberty to ping you here since you seem to be working
on similar issues lately.
--
This is an automated message from the Apache Git Service.
To respon
coderfender commented on code in PR #3017:
URL: https://github.com/apache/datafusion-comet/pull/3017#discussion_r2662319466
##
native/spark-expr/src/conversion_funcs/cast.rs:
##
@@ -1957,41 +1967,46 @@ fn cast_string_to_int_with_range_check(
/// Equivalent to
/// - org.apache.
alamb commented on issue #14431:
URL: https://github.com/apache/datafusion/issues/14431#issuecomment-3711468279
I don't think so
I don't think @eliaperantoni is actively working on this area any more
I am not sure if @kumarUjjawal is doing so either
--
This is an automated
alamb commented on issue #19487:
URL: https://github.com/apache/datafusion/issues/19487#issuecomment-3712022437
I'll give it a review shortly
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the s
xavlee commented on issue #19655:
URL: https://github.com/apache/datafusion/issues/19655#issuecomment-3712011265
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To uns
alamb commented on code in PR #19625:
URL: https://github.com/apache/datafusion/pull/19625#discussion_r2662746848
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/accumulate.rs:
##
@@ -59,6 +59,10 @@ pub struct NullState {
/// If `seen_values[i]` is
alamb commented on issue #3463:
URL: https://github.com/apache/datafusion/issues/3463#issuecomment-3712034479
> Oh right, yes it will do that sorry, been years since I wrote that code
(and it looks like there's some new PushDecoder anyway that might change all of
this).
FWIW the pus
alamb commented on code in PR #18906:
URL: https://github.com/apache/datafusion/pull/18906#discussion_r2662773701
##
datafusion/physical-plan/src/aggregates/group_values/row.rs:
##
@@ -206,37 +233,52 @@ impl GroupValues for GroupValuesRows {
output
alamb-ghbot commented on PR #19562:
URL: https://github.com/apache/datafusion/pull/19562#issuecomment-3712048320
🤖 `./gh_compare_branch.sh`
[gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh)
Running
Linux aal-dev 6.14.0-1018-gc
alamb commented on PR #19562:
URL: https://github.com/apache/datafusion/pull/19562#issuecomment-3712047945
run benchmarks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
andygrove commented on code in PR #3018:
URL: https://github.com/apache/datafusion-comet/pull/3018#discussion_r2662775773
##
native/spark-expr/src/json_funcs/to_json.rs:
##
@@ -181,6 +188,23 @@ fn escape_string(input: &str) -> String {
escaped_string
}
+fn normalize_spec
andygrove commented on code in PR #3018:
URL: https://github.com/apache/datafusion-comet/pull/3018#discussion_r2662782397
##
native/spark-expr/src/json_funcs/to_json.rs:
##
@@ -181,6 +188,23 @@ fn escape_string(input: &str) -> String {
escaped_string
}
+fn normalize_spec
alamb merged PR #19644:
URL: https://github.com/apache/datafusion/pull/19644
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb merged PR #19645:
URL: https://github.com/apache/datafusion/pull/19645
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb opened a new issue, #19656:
URL: https://github.com/apache/datafusion/issues/19656
### Describe the bug
Here is an example:
https://github.com/apache/datafusion/actions/runs/20728924060/job/59511596846
### To Reproduce
It appears due to `aws-smithy-runtime`
milenkovicm commented on code in PR #1360:
URL:
https://github.com/apache/datafusion-ballista/pull/1360#discussion_r2661979782
##
ballista/scheduler/Cargo.toml:
##
@@ -52,6 +53,7 @@ clap = { workspace = true, optional = true }
dashmap = { workspace = true }
datafusion = { wor
mzabaluev commented on PR #19496:
URL: https://github.com/apache/datafusion/pull/19496#issuecomment-3710992619
Benchmark results against the branch base
```
nth_value_ignore_nulls/first_value_expanding/0%_nulls
time: [229.32 µs 229.97 µs 230.68 µs]
haohuaijin opened a new pull request, #19653:
URL: https://github.com/apache/datafusion/pull/19653
## Which issue does this PR close?
close https://github.com/apache/datafusion/issues/19638
## Rationale for this change
see issue #19638
## What changes are included
mzabaluev commented on code in PR #19496:
URL: https://github.com/apache/datafusion/pull/19496#discussion_r2661969517
##
datafusion/functions-window/src/nth_value.rs:
##
@@ -519,6 +467,87 @@ impl PartitionEvaluator for NthValueEvaluator {
}
}
+impl NthValueEvaluator {
+
mattcuento commented on code in PR #1360:
URL:
https://github.com/apache/datafusion-ballista/pull/1360#discussion_r2662111775
##
Cargo.toml:
##
@@ -39,6 +39,7 @@ datafusion = "51.0.0"
datafusion-cli = "51.0.0"
datafusion-proto = "51.0.0"
datafusion-proto-common = "51.0.0"
+d
milenkovicm commented on code in PR #1360:
URL:
https://github.com/apache/datafusion-ballista/pull/1360#discussion_r2662119757
##
Cargo.toml:
##
@@ -39,6 +39,7 @@ datafusion = "51.0.0"
datafusion-cli = "51.0.0"
datafusion-proto = "51.0.0"
datafusion-proto-common = "51.0.0"
+
alamb commented on PR #18868:
URL: https://github.com/apache/datafusion/pull/18868#issuecomment-3711927679
Hi @xudong963 -- I am now back from vacation and will review this PR either
later today or tomorrow
--
This is an automated message from the Apache Git Service.
To respond to the me
Dandandan commented on code in PR #19622:
URL: https://github.com/apache/datafusion/pull/19622#discussion_r2662671140
##
datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs:
##
@@ -564,6 +566,7 @@ fn test_pushdown_through_aggregates_on_grouping_columns() {
// 2.
alamb-ghbot commented on PR #19590:
URL: https://github.com/apache/datafusion/pull/19590#issuecomment-3711931023
🤖 Hi @alamb, thanks for the request
(https://github.com/apache/datafusion/pull/19590#issuecomment-3711930835).
[`scrape_comments.py`](https://github.com/alamb/datafusion-b
alamb commented on PR #19590:
URL: https://github.com/apache/datafusion/pull/19590#issuecomment-3711930835
run benchmark substr_index
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
alamb commented on PR #19622:
URL: https://github.com/apache/datafusion/pull/19622#issuecomment-3711902737
run benchmark clickbench_partitioned
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
alamb-ghbot commented on PR #19622:
URL: https://github.com/apache/datafusion/pull/19622#issuecomment-3711903165
🤖 `./gh_compare_branch.sh`
[gh_compare_branch.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_branch.sh)
Running
Linux aal-dev 6.14.0-1018-gc
alamb commented on PR #19622:
URL: https://github.com/apache/datafusion/pull/19622#issuecomment-3711902125
> QQuery 23 still seems to be leading ahead!
I suspect this has to do with timing. Basically Q23 is like `select * from
... WHERE ... ` type query
This can now takes adv
alamb commented on code in PR #19622:
URL: https://github.com/apache/datafusion/pull/19622#discussion_r2662653579
##
datafusion/physical-plan/src/coalesce_batches.rs:
##
@@ -57,6 +57,10 @@ use futures::stream::{Stream, StreamExt};
/// reaches the `fetch` value.
///
/// See [`
alamb commented on code in PR #19590:
URL: https://github.com/apache/datafusion/pull/19590#discussion_r2662689222
##
datafusion/functions/src/unicode/substrindex.rs:
##
@@ -182,7 +182,8 @@ fn substr_index_general<
where
T::Native: OffsetSizeTrait,
{
-let mut builder =
mattcuento commented on code in PR #1360:
URL:
https://github.com/apache/datafusion-ballista/pull/1360#discussion_r2661844363
##
ballista/scheduler/Cargo.toml:
##
@@ -52,6 +53,7 @@ clap = { workspace = true, optional = true }
dashmap = { workspace = true }
datafusion = { work
milenkovicm commented on issue #1258:
URL:
https://github.com/apache/datafusion-ballista/issues/1258#issuecomment-375904
thanks @killzoner
will have fun rebasing #1361 😀
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHu
mattcuento commented on PR #1360:
URL:
https://github.com/apache/datafusion-ballista/pull/1360#issuecomment-3711375113
Will review the latest `test linux balista/crates` failures this evening
--
This is an automated message from the Apache Git Service.
To respond to the message, please lo
mattcuento commented on PR #1365:
URL:
https://github.com/apache/datafusion-ballista/pull/1365#issuecomment-3711357963
Eh that's silly, I'll update both here to get merged together.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Gi
milenkovicm commented on PR #1360:
URL:
https://github.com/apache/datafusion-ballista/pull/1360#issuecomment-3711385863
perhaps, action getting out of disk space ? looks like linker freaking out,
which should not be related to your change
--
This is an automated message from the Apache
parthchandra commented on code in PR #2948:
URL: https://github.com/apache/datafusion-comet/pull/2948#discussion_r2662264601
##
spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala:
##
@@ -478,29 +478,33 @@ case class CometScanRule(session: SparkSession) extends
Rule
coderfender commented on PR #3017:
URL:
https://github.com/apache/datafusion-comet/pull/3017#issuecomment-3711389280
@andygrove , sure
Here are the benchmarks compared through `critcmp`
```
groupfeature
ma
AntoinePrv opened a new issue, #19654:
URL: https://github.com/apache/datafusion/issues/19654
### Is your feature request related to a problem or challenge?
`Dataframe::limit` offset option is not used to skip rows when reading a
parquet file.
Using this reproducer with `datafu
GaneshPatil7517 commented on PR #19619:
URL: https://github.com/apache/datafusion/pull/19619#issuecomment-3711788173
> @adriangb could you run the workflows again? @GaneshPatil7517 since
`with_projection` and `with_batch_size` are being deprecated, we also need to
update those uses in DataF
mattcuento commented on PR #1365:
URL:
https://github.com/apache/datafusion-ballista/pull/1365#issuecomment-3711788599
@milenkovicm yep please feel free to merge, looks like it's no longer in
draft now
--
This is an automated message from the Apache Git Service.
To respond to the message
coderfender commented on PR #3017:
URL:
https://github.com/apache/datafusion-comet/pull/3017#issuecomment-3711785788
In other notes, I was also experimenting in implementing a two pass fast
algorithm using a switch fallthrough (similar to what my friend wrote here) but
the implementation b
andygrove commented on PR #3017:
URL:
https://github.com/apache/datafusion-comet/pull/3017#issuecomment-3711805196
> In other notes, I was also experimenting in implementing a two pass fast
algorithm using a switch fallthrough but the implementation became super
complicated with diminishin
NGA-TRAN opened a new issue, #19655:
URL: https://github.com/apache/datafusion/issues/19655
### Is your feature request related to a problem or challenge?
We have a use case where the query groups by columns that are implicitly
sorted, and we would like DataFusion to recognize that or
andygrove commented on PR #3035:
URL:
https://github.com/apache/datafusion-comet/pull/3035#issuecomment-3711817923
Thanks, @Kimahriman. Please also add content to the documentation (either
the user guide or the contributor guide) explaining this new feature.
--
This is an automated messa
milenkovicm merged PR #1365:
URL: https://github.com/apache/datafusion-ballista/pull/1365
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubsc
milenkovicm commented on PR #1365:
URL:
https://github.com/apache/datafusion-ballista/pull/1365#issuecomment-3711839508
thanks @mattcuento
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the sp
geoffreyclaude commented on code in PR #130:
URL: https://github.com/apache/datafusion-site/pull/130#discussion_r2661730358
##
content/blog/2025-12-18-extending-sql.md:
##
@@ -0,0 +1,379 @@
+---
+layout: post
+title: Extending SQL in DataFusion: from ->> to TABLESAMPLE
+date: 20
pepijnve commented on code in PR #17867:
URL: https://github.com/apache/datafusion/pull/17867#discussion_r2661771351
##
datafusion-cli/src/progress/plan_introspect.rs:
##
@@ -0,0 +1,217 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lic
Brijesh-Thakkar closed pull request #19598: perf: optimize bit_length for
string arrays
URL: https://github.com/apache/datafusion/pull/19598
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the speci
milenkovicm commented on code in PR #1360:
URL:
https://github.com/apache/datafusion-ballista/pull/1360#discussion_r2661989973
##
Cargo.toml:
##
@@ -39,6 +39,7 @@ datafusion = "51.0.0"
datafusion-cli = "51.0.0"
datafusion-proto = "51.0.0"
datafusion-proto-common = "51.0.0"
+
mzabaluev commented on code in PR #19496:
URL: https://github.com/apache/datafusion/pull/19496#discussion_r2661996145
##
datafusion/functions-window/src/nth_value.rs:
##
@@ -519,6 +467,87 @@ impl PartitionEvaluator for NthValueEvaluator {
}
}
+impl NthValueEvaluator {
+
Brijesh-Thakkar closed pull request #19581: perf: optimize octet_length for
string arrays
URL: https://github.com/apache/datafusion/pull/19581
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
Brijesh-Thakkar closed pull request #19587: Fix NULL handling in
ScalarValue::partial_cmp (closes #19579)
URL: https://github.com/apache/datafusion/pull/19587
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
mattcuento commented on code in PR #1360:
URL:
https://github.com/apache/datafusion-ballista/pull/1360#discussion_r2662200067
##
Cargo.toml:
##
@@ -39,6 +39,7 @@ datafusion = "51.0.0"
datafusion-cli = "51.0.0"
datafusion-proto = "51.0.0"
datafusion-proto-common = "51.0.0"
+d
mbutrovich merged PR #3038:
URL: https://github.com/apache/datafusion-comet/pull/3038
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...
mbutrovich merged PR #3039:
URL: https://github.com/apache/datafusion-comet/pull/3039
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...
mattcuento opened a new pull request, #1365:
URL: https://github.com/apache/datafusion-ballista/pull/1365
# Which issue does this PR close?
Closes #.
# Rationale for this change
Bumping rust version to keep up to date with the
`ballista-builder.Dockerfile`. It w
andygrove commented on PR #3017:
URL:
https://github.com/apache/datafusion-comet/pull/3017#issuecomment-3711326648
Thanks @coderfender. I think it would be useful to add a criterion benchmark
as well, so we can more easily measure the improvement compared to the main
branch.
--
This is
coderfender commented on PR #3017:
URL:
https://github.com/apache/datafusion-comet/pull/3017#issuecomment-3711333041
Sure @andygrove . Let me get the benchmark file from stash and push a
commit
--
This is an automated message from the Apache Git Service.
To respond to the message, plea
milenkovicm commented on PR #1365:
URL:
https://github.com/apache/datafusion-ballista/pull/1365#issuecomment-3711347038
https://github.com/apache/datafusion-ballista/blob/8ac74028c5f21faf519a812b5cb44946a389dc81/dev/docker/ballista-builder.Dockerfile#L18
as well
--
This is an automated
mattcuento commented on PR #1365:
URL:
https://github.com/apache/datafusion-ballista/pull/1365#issuecomment-3711350775
@milenkovicm the ballista-builder reference will get bumped in #1360 to fix
the build issues with the protobuf compiler 🙂
--
This is an automated message from the Apach
alamb commented on PR #19639:
URL: https://github.com/apache/datafusion/pull/19639#issuecomment-3711961830
Did you find any evidence that the selectivity of predicates changes over
the course of the query (or put another way that reordering them during
execution would help?)
--
This is a
Dandandan commented on code in PR #19411:
URL: https://github.com/apache/datafusion/pull/19411#discussion_r2662706360
##
datafusion/common/src/config.rs:
##
@@ -468,6 +468,25 @@ config_namespace! {
/// metadata memory consumption
pub batch_size: usize, default
Dandandan commented on PR #19411:
URL: https://github.com/apache/datafusion/pull/19411#issuecomment-3711965227
Can you solve the conflicts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
adriangb commented on PR #19639:
URL: https://github.com/apache/datafusion/pull/19639#issuecomment-3711967371
> Did you find any evidence that the selectivity of predicates changes over
the course of the query (or put another way that reordering them during
execution would help?)
I i
alamb commented on code in PR #19616:
URL: https://github.com/apache/datafusion/pull/19616#discussion_r2662704588
##
datafusion/execution/src/cache/list_files_cache.rs:
##
@@ -146,9 +149,12 @@ pub const DEFAULT_LIST_FILES_CACHE_MEMORY_LIMIT: usize =
1024 * 1024; // 1MiB
/// Th
andygrove commented on code in PR #3018:
URL: https://github.com/apache/datafusion-comet/pull/3018#discussion_r2662709821
##
native/spark-expr/src/json_funcs/to_json.rs:
##
@@ -181,6 +188,23 @@ fn escape_string(input: &str) -> String {
escaped_string
}
+fn normalize_spec
alamb-ghbot commented on PR #19622:
URL: https://github.com/apache/datafusion/pull/19622#issuecomment-371196
🤖: Benchmark completed
Details
```
Comparing HEAD and feat-deprecate-coalesce-batches
Benchmark clickbench_partitioned.json
-
codecov-commenter commented on PR #3018:
URL:
https://github.com/apache/datafusion-comet/pull/3018#issuecomment-3711988180
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/3018?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
andygrove commented on PR #3011:
URL:
https://github.com/apache/datafusion-comet/pull/3011#issuecomment-3711987315
@kazantsev-maksim could you merge latest from main to disable the failing
test
--
This is an automated message from the Apache Git Service.
To respond to the message, please
AdamGS commented on issue #18566:
URL: https://github.com/apache/datafusion/issues/18566#issuecomment-3711986198
Tested with vortex and it looks good -
https://github.com/vortex-data/vortex/pull/5863
--
This is an automated message from the Apache Git Service.
To respond to the message, p
alamb-ghbot commented on PR #19590:
URL: https://github.com/apache/datafusion/pull/19590#issuecomment-3711989071
🤖 `./gh_compare_branch_bench.sh`
[compare_branch_bench.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/compare_branch_bench.sh)
Running
Linux aal-dev 6.
alamb commented on issue #19654:
URL: https://github.com/apache/datafusion/issues/19654#issuecomment-3711997077
Hi @AntoinePrv -- I am definitely surprised at this finding (I would expect
DataFusion to do this pretty fast)
I think what is happening is that datafusion is not pushing d
alamb commented on issue #19654:
URL: https://github.com/apache/datafusion/issues/19654#issuecomment-3711997858
So TLDR is we need to implement the `offset` optimization in the parquet scan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log o
alamb-ghbot commented on PR #19590:
URL: https://github.com/apache/datafusion/pull/19590#issuecomment-3712006434
🤖: Benchmark completed
Details
```
group main
perf_substrindex
-
milenkovicm commented on code in PR #1360:
URL:
https://github.com/apache/datafusion-ballista/pull/1360#discussion_r2661903007
##
ballista/scheduler/Cargo.toml:
##
@@ -52,6 +53,7 @@ clap = { workspace = true, optional = true }
dashmap = { workspace = true }
datafusion = { wor
andygrove opened a new issue, #3040:
URL: https://github.com/apache/datafusion-comet/issues/3040
### What is the problem the feature request solves?
`auto` scan mode should select `native_datafusion` for supported use cases.
### Describe the potential solution
_No respons
1 - 100 of 298 matches
Mail list logo