Re: [PR] chore: bump ruby/setup-ruby from 1.227.0 to 1.229.0 [arrow-adbc]

2025-03-31 Thread via GitHub
lidavidm merged PR #2662: URL: https://github.com/apache/arrow-adbc/pull/2662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected]

Re: [PR] chore(java): bump com.uber.nullaway:nullaway from 0.12.4 to 0.12.6 in /java [arrow-adbc]

2025-03-31 Thread via GitHub
lidavidm merged PR #2660: URL: https://github.com/apache/arrow-adbc/pull/2660 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected]

Re: [I] ArrowReaderMetadata API makes it too easy to (accidentally) make an additional object store request [arrow-rs]

2025-03-31 Thread via GitHub
alamb commented on issue #6476: URL: https://github.com/apache/arrow-rs/issues/6476#issuecomment-2767676820 Thanks @etseidl -- that would be great -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] chore(go/adbc): bump google.golang.org/api from 0.227.0 to 0.228.0 in /go/adbc [arrow-adbc]

2025-03-31 Thread via GitHub
lidavidm commented on PR #2659: URL: https://github.com/apache/arrow-adbc/pull/2659#issuecomment-2767675640 Seems this broke because of https://github.com/actions/setup-go/issues/457 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Remove `AsyncFileReader::get_metadata_with_options`, add `options` to `AsyncFileReader::get_metadata` [arrow-rs]

2025-03-31 Thread via GitHub
alamb commented on PR #7342: URL: https://github.com/apache/arrow-rs/pull/7342#issuecomment-2767676699 I updated this PR's title to reflect what I think it currently does It looks like this PR has some merge conflicts, but once those are solved it will be good to go from my perspectiv

[PR] Mimalloc2.1.9 [arrow]

2025-03-31 Thread via GitHub
pitrou opened a new pull request, #45983: URL: https://github.com/apache/arrow/pull/45983 Thanks for opening a pull request! If this is your first pull request you can find detailed information on how to contribute here: * [New Contributor's Guide](https://arrow.apache.org/d

Re: [PR] feat(csharp): Implement CloudFetch for Databricks Spark driver [arrow-adbc]

2025-03-31 Thread via GitHub
CurtHagenlocher merged PR #2634: URL: https://github.com/apache/arrow-adbc/pull/2634 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Expose flexible apis for building bloom filters when writing parquet [arrow-rs]

2025-03-31 Thread via GitHub
alamb commented on issue #5108: URL: https://github.com/apache/arrow-rs/issues/5108#issuecomment-2766471776 To support this usecase I suggest buildig your custom structure and then saving it as user defined metadata in the parquet file. This requires no changes to the parquet crate and woul

Re: [I] [CI][Python] A new version (77.0.2) of setuptools seems to have broken some of our builds [arrow]

2025-03-31 Thread via GitHub
kou commented on issue #45867: URL: https://github.com/apache/arrow/issues/45867#issuecomment-2767874887 In general, we should create `python/LICENSE.txt` and `python/NOTICE.txt` instead of symbolic linking to the top-level `LICENSE.txt`/`NOTICE.txt` because the top-level `LICENSE.txt`/`NOT

[PR] MINOR: [C#] Bump Google.Protobuf and System.Memory in /csharp [arrow]

2025-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #45984: URL: https://github.com/apache/arrow/pull/45984 Bumps [Google.Protobuf](https://github.com/protocolbuffers/protobuf) and System.Memory. These dependencies needed to be updated together. Updates `Google.Protobuf` from 3.30.1 to 3.30.2

Re: [PR] MINOR: [CI] Bump actions/setup-python from 5.4.0 to 5.5.0 [arrow]

2025-03-31 Thread via GitHub
kou merged PR #45982: URL: https://github.com/apache/arrow/pull/45982 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected]

Re: [PR] chore: Bump actions/setup-python from 5.4.0 to 5.5.0 [arrow-go]

2025-03-31 Thread via GitHub
kou merged PR #338: URL: https://github.com/apache/arrow-go/pull/338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected]

Re: [PR] feat(csharp): Implement CloudFetch for Databricks Spark driver [arrow-adbc]

2025-03-31 Thread via GitHub
CurtHagenlocher commented on code in PR #2634: URL: https://github.com/apache/arrow-adbc/pull/2634#discussion_r2021626037 ## csharp/src/Drivers/Apache/Spark/CloudFetch/SparkCloudFetchReader.cs: ## @@ -0,0 +1,269 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] MINOR: [C#] Bump System.ValueTuple from 4.5.0 to 4.6.1 in /csharp [arrow]

2025-03-31 Thread via GitHub
dependabot[bot] commented on PR #45912: URL: https://github.com/apache/arrow/pull/45912#issuecomment-2767184857 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let

Re: [PR] MINOR: [C#] Bump System.ValueTuple from 4.5.0 to 4.6.1 in /csharp [arrow]

2025-03-31 Thread via GitHub
CurtHagenlocher closed pull request #45912: MINOR: [C#] Bump System.ValueTuple from 4.5.0 to 4.6.1 in /csharp URL: https://github.com/apache/arrow/pull/45912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] GH-36411: [C++][Python] Use meson-python for PyArrow build system [arrow]

2025-03-31 Thread via GitHub
WillAyd commented on code in PR #45854: URL: https://github.com/apache/arrow/pull/45854#discussion_r2021274909 ## dev/release/02-source-test.rb: ## @@ -84,7 +84,12 @@ def test_csharp_git_commit_information def test_python_version source Dir.chdir("#{@tag_name_no_rc}

Re: [PR] Add Parquet Modular encryption support (write) [arrow-rs]

2025-03-31 Thread via GitHub
alamb commented on PR #7111: URL: https://github.com/apache/arrow-rs/pull/7111#issuecomment-2767084736 Thanks @adamreeve @corwinjoy and @rok ! I'll plan to merge this tomorrow This PR appeared to have a conflict so I took the liberty of merging up from main ![Screenshot 2025-0

Re: [PR] GH-43057: [C++] Thread-safe AesEncryptor / AesDecryptor [arrow]

2025-03-31 Thread via GitHub
AlenkaF commented on PR #44990: URL: https://github.com/apache/arrow/pull/44990#issuecomment-2768315368 @EnricoMi there are CMake issues that need to be fixed so I am not sure there will be capacity for the final review before the feature freeze. -- This is an automated message from the A

[PR] chore: Bump actions/setup-python from 5.4.0 to 5.5.0 [arrow-go]

2025-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #338: URL: https://github.com/apache/arrow-go/pull/338 Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.4.0 to 5.5.0. Release notes Sourced from https://github.com/actions/setup-python/releases";>actions/setup-p

[PR] chore(rust): bump the arrow group in /rust with 2 updates [arrow-adbc]

2025-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #2661: URL: https://github.com/apache/arrow-adbc/pull/2661 Bumps the arrow group in /rust with 2 updates: [arrow-buffer](https://github.com/apache/arrow-rs) and [arrow-schema](https://github.com/apache/arrow-rs). Updates `arrow-buffer` from

Re: [I] [C++] Potential bug in ReadaheadGenerator [arrow]

2025-03-31 Thread via GitHub
mapleFU commented on issue #45953: URL: https://github.com/apache/arrow/issues/45953#issuecomment-2767205645 Issue resolved by pull request 45954 https://github.com/apache/arrow/pull/45954 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] Test int96 Parquet file from Spark [arrow-rs]

2025-03-31 Thread via GitHub
mbutrovich opened a new pull request, #7367: URL: https://github.com/apache/arrow-rs/pull/7367 # Which issue does this PR close? I think we can close out #7220 after this test merges. # Rationale for this change We would like to enforce testing on a chal

Re: [PR] Test int96 Parquet file from Spark [arrow-rs]

2025-03-31 Thread via GitHub
mbutrovich commented on code in PR #7367: URL: https://github.com/apache/arrow-rs/pull/7367#discussion_r2021824338 ## parquet/src/arrow/arrow_reader/mod.rs: ## @@ -978,6 +972,11 @@ mod tests { ArrowError, DataType as ArrowDataType, Field, Fields, Schema, SchemaRef, Tim

Re: [PR] chore(go/adbc): bump google.golang.org/api from 0.227.0 to 0.228.0 in /go/adbc [arrow-adbc]

2025-03-31 Thread via GitHub
lidavidm commented on PR #2659: URL: https://github.com/apache/arrow-adbc/pull/2659#issuecomment-2768108755 Ugh. @zeroshade staticcheck needs to be built with Go 1.24 to check some files in this dep, so I bumped the Go version, which required bumping golangci-lint, which (1) made a b

Re: [PR] GH-39811: [R] better documentation for col_types argument in open_delim_dataset [arrow]

2025-03-31 Thread via GitHub
atsyplenkov commented on PR #45719: URL: https://github.com/apache/arrow/pull/45719#issuecomment-2768113139 @jonkeane sorry for the long turnaround; this issue fell through the cracks. I followed the approach suggested by @thisisnic. I added a number of tests, and as far as I can see, every

Re: [PR] MINOR: [C#] Bump Google.Protobuf and System.Memory in /csharp [arrow]

2025-03-31 Thread via GitHub
CurtHagenlocher merged PR #45984: URL: https://github.com/apache/arrow/pull/45984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arr

[PR] chore(java): bump com.uber.nullaway:nullaway from 0.12.4 to 0.12.6 in /java [arrow-adbc]

2025-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #2660: URL: https://github.com/apache/arrow-adbc/pull/2660 Bumps [com.uber.nullaway:nullaway](https://github.com/uber/NullAway) from 0.12.4 to 0.12.6. Release notes Sourced from https://github.com/uber/NullAway/releases";>com.uber.nullaway

Re: [I] ArrowReaderMetadata API makes it too easy to (accidentally) make an additional object store request [arrow-rs]

2025-03-31 Thread via GitHub
etseidl commented on issue #6476: URL: https://github.com/apache/arrow-rs/issues/6476#issuecomment-2767350883 @adamreeve recently jogged my memory of this issue (https://github.com/apache/arrow-rs/pull/7342#discussion_r2020302189). I think the recent changes to the API (#6637, #7334, #7342)

Re: [PR] feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior [arrow-rs]

2025-03-31 Thread via GitHub
himadripal commented on code in PR #7179: URL: https://github.com/apache/arrow-rs/pull/7179#discussion_r2021953487 ## arrow-cast/src/parse.rs: ## @@ -850,7 +850,16 @@ fn parse_e_notation( } if exp < 0 { -result = result.div_wrapping(base.pow_wrapping(-exp as

Re: [PR] Remove `AsyncFileReader::get_metadata_with_options`, add `options` to `AsyncFileReader::get_metadata` [arrow-rs]

2025-03-31 Thread via GitHub
corwinjoy commented on PR #7342: URL: https://github.com/apache/arrow-rs/pull/7342#issuecomment-2767740244 > I updated this PR's title to reflect what I think it currently does > > It looks like this PR has some merge conflicts, but once those are solved it will be good to go from my

Re: [PR] feat(parquet/metadata): bloom filter implementation [arrow-go]

2025-03-31 Thread via GitHub
zeroshade commented on code in PR #336: URL: https://github.com/apache/arrow-go/pull/336#discussion_r2021255204 ## parquet/metadata/column_chunk.go: ## @@ -349,12 +357,12 @@ type EncodingStats struct { // Finish finalizes the metadata with the given offsets, // flushes any com

Re: [PR] feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior [arrow-rs]

2025-03-31 Thread via GitHub
himadripal commented on code in PR #7179: URL: https://github.com/apache/arrow-rs/pull/7179#discussion_r2021959530 ## arrow-cast/src/parse.rs: ## @@ -986,6 +1009,13 @@ pub fn parse_decimal( "parse decimal overflow ({s})" ))); } +if

Re: [PR] GH-43057: [C++] Thread-safe AesEncryptor / AesDecryptor [arrow]

2025-03-31 Thread via GitHub
EnricoMi commented on PR #44990: URL: https://github.com/apache/arrow/pull/44990#issuecomment-2768170465 @pitrou do you think it is like to get this in before the feature freeze? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior [arrow-rs]

2025-03-31 Thread via GitHub
kazuyukitanimura commented on code in PR #7179: URL: https://github.com/apache/arrow-rs/pull/7179#discussion_r2022138201 ## arrow-cast/src/parse.rs: ## @@ -986,6 +1009,13 @@ pub fn parse_decimal( "parse decimal overflow ({s})" ))); } +

[PR] chore: Bump modernc.org/sqlite from 1.29.6 to 1.37.0 [arrow-go]

2025-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #339: URL: https://github.com/apache/arrow-go/pull/339 Bumps [modernc.org/sqlite](https://gitlab.com/cznic/sqlite) from 1.29.6 to 1.37.0. Commits https://gitlab.com/cznic/sqlite/commit/dc8212054b608339e80d7e986e530fa24bc5e369";>dc82120

Re: [PR] GH-45185: [C++][Parquet] Raise an error for invalid repetition levels when delimiting records [arrow]

2025-03-31 Thread via GitHub
adamreeve commented on PR #45186: URL: https://github.com/apache/arrow/pull/45186#issuecomment-2767330231 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[PR] feat(csharp): Add retry-after behavior for 503 responses in Spark ADBC driver [arrow-adbc]

2025-03-31 Thread via GitHub
jadewang-db opened a new pull request, #2664: URL: https://github.com/apache/arrow-adbc/pull/2664 ## Description This PR implements retry-after behavior for the Spark ADBC driver when receiving 503 responses with Retry-After headers. This is particularly useful for Databricks cluster

Re: [PR] feat(csharp): Add retry-after behavior for 503 responses in Spark ADBC driver [arrow-adbc]

2025-03-31 Thread via GitHub
jadewang-db closed pull request #2657: feat(csharp): Add retry-after behavior for 503 responses in Spark ADBC driver URL: https://github.com/apache/arrow-adbc/pull/2657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] chore: Bump modernc.org/sqlite from 1.29.6 to 1.36.2 [arrow-go]

2025-03-31 Thread via GitHub
dependabot[bot] closed pull request #329: chore: Bump modernc.org/sqlite from 1.29.6 to 1.36.2 URL: https://github.com/apache/arrow-go/pull/329 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior [arrow-rs]

2025-03-31 Thread via GitHub
himadripal commented on code in PR #7179: URL: https://github.com/apache/arrow-rs/pull/7179#discussion_r2021953487 ## arrow-cast/src/parse.rs: ## @@ -850,7 +850,16 @@ fn parse_e_notation( } if exp < 0 { -result = result.div_wrapping(base.pow_wrapping(-exp as

Re: [I] Parquet decoder / decoded page Cache [arrow-rs]

2025-03-31 Thread via GitHub
alamb commented on issue #7363: URL: https://github.com/apache/arrow-rs/issues/7363#issuecomment-2767361270 I have run some performance tests, see: - https://github.com/apache/datafusion/pull/15506 I think we will need to look at the ones that report a slowdown more carefully to deter

Re: [PR] GH-45949: [R] Fix CRAN warnings for 19.0.1 about compiled code [arrow]

2025-03-31 Thread via GitHub
jonkeane commented on code in PR #45951: URL: https://github.com/apache/arrow/pull/45951#discussion_r2021527879 ## r/src/arrow_types.h: ## @@ -173,14 +173,33 @@ template class RBuffer : public MutableBuffer { public: explicit RBuffer(RVector vec) - : MutableBuffer(re

Re: [PR] GH-45953: [C++] Use lock to fix atomic bug in ReadaheadGenerator [arrow]

2025-03-31 Thread via GitHub
mapleFU merged PR #45954: URL: https://github.com/apache/arrow/pull/45954 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected]

Re: [PR] MINOR: [CI] Bump actions/setup-python from 5.4.0 to 5.5.0 [arrow]

2025-03-31 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #45982: URL: https://github.com/apache/arrow/pull/45982#issuecomment-2767964617 After merging your PR, Conbench analyzed the 0 benchmarking runs that have been run so far on merge-commit 60b5ab9ee0bf070f03cf5c92fe0add0257543dfd. None of the s

Re: [PR] feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior [arrow-rs]

2025-03-31 Thread via GitHub
himadripal commented on code in PR #7179: URL: https://github.com/apache/arrow-rs/pull/7179#discussion_r2021959530 ## arrow-cast/src/parse.rs: ## @@ -986,6 +1009,13 @@ pub fn parse_decimal( "parse decimal overflow ({s})" ))); } +if

Re: [PR] feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior [arrow-rs]

2025-03-31 Thread via GitHub
himadripal commented on code in PR #7179: URL: https://github.com/apache/arrow-rs/pull/7179#discussion_r2021960934 ## arrow-cast/src/parse.rs: ## @@ -986,6 +1009,13 @@ pub fn parse_decimal( "parse decimal overflow ({s})" ))); } +if

Re: [PR] feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior [arrow-rs]

2025-03-31 Thread via GitHub
himadripal commented on code in PR #7179: URL: https://github.com/apache/arrow-rs/pull/7179#discussion_r2021960934 ## arrow-cast/src/parse.rs: ## @@ -986,6 +1009,13 @@ pub fn parse_decimal( "parse decimal overflow ({s})" ))); } +if

Re: [PR] feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior [arrow-rs]

2025-03-31 Thread via GitHub
himadripal commented on code in PR #7179: URL: https://github.com/apache/arrow-rs/pull/7179#discussion_r2021960934 ## arrow-cast/src/parse.rs: ## @@ -986,6 +1009,13 @@ pub fn parse_decimal( "parse decimal overflow ({s})" ))); } +if

Re: [I] [C++] Bump bundled AWS related libraries [arrow]

2025-03-31 Thread via GitHub
kou commented on issue #45993: URL: https://github.com/apache/arrow/issues/45993#issuecomment-2767773556 Ah, I should complete https://github.com/apache/arrow/pull/45306 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] GH-45961: [Release][Docs] Upload generated docs to GitHub Releases not apache.jfrog.io [arrow]

2025-03-31 Thread via GitHub
kou commented on PR #45963: URL: https://github.com/apache/arrow/pull/45963#issuecomment-2767782144 Previously, the docs were generated after a release vote. But it sometimes failed. So, we changed to generate the docs in release process, keep it and upload it to apache/arrow-site after a r

Re: [PR] GH-45195: [C++] Update bundled AWS SDK for C++ to 1.11.489 [arrow]

2025-03-31 Thread via GitHub
kou commented on PR #45306: URL: https://github.com/apache/arrow/pull/45306#issuecomment-2767782488 I'll restart this for #45993. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior [arrow-rs]

2025-03-31 Thread via GitHub
himadripal commented on code in PR #7179: URL: https://github.com/apache/arrow-rs/pull/7179#discussion_r2021959530 ## arrow-cast/src/parse.rs: ## @@ -986,6 +1009,13 @@ pub fn parse_decimal( "parse decimal overflow ({s})" ))); } +if

[PR] chore: bump ruby/setup-ruby from 1.227.0 to 1.229.0 [arrow-adbc]

2025-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #2662: URL: https://github.com/apache/arrow-adbc/pull/2662 Bumps [ruby/setup-ruby](https://github.com/ruby/setup-ruby) from 1.227.0 to 1.229.0. Release notes Sourced from https://github.com/ruby/setup-ruby/releases";>ruby/setup-ruby's rel

Re: [PR] Parquet: POC for handling struct child via StatisticsConverter [arrow-rs]

2025-03-31 Thread via GitHub
kylebarron commented on code in PR #7365: URL: https://github.com/apache/arrow-rs/pull/7365#discussion_r2021852968 ## parquet/src/arrow/arrow_reader/statistics.rs: ## @@ -1290,32 +1290,69 @@ impl<'a> StatisticsConverter<'a> { /// /// * If the column is not found in the

Re: [I] object_store: abort_multipart() should return NotFound error if not found [arrow-rs-object-store]

2025-03-31 Thread via GitHub
ByteBaker commented on issue #146: URL: https://github.com/apache/arrow-rs-object-store/issues/146#issuecomment-2767090553 @alamb should we close this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] [Python] App Store Rejection Due to Non-Public APIs in PyArrow (libarrow.1900.dylib) [arrow]

2025-03-31 Thread via GitHub
lukedg97 commented on issue #45642: URL: https://github.com/apache/arrow/issues/45642#issuecomment-2767646143 The wheel built with this script has passed app review. This is a full workaround for me at this time. It would still of course be better to not have these symbols in

Re: [PR] MINOR: [C#] Bump Google.Protobuf and System.Memory in /csharp [arrow]

2025-03-31 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #45984: URL: https://github.com/apache/arrow/pull/45984#issuecomment-2767646562 After merging your PR, Conbench analyzed the 0 benchmarking runs that have been run so far on merge-commit 3828a2e5e2b168e06861528d2dcd1e58a27052d0. None of the s

Re: [PR] MINOR: Bump com.puppycrawl.tools:checkstyle from 10.21.4 to 10.22.0 [arrow-java]

2025-03-31 Thread via GitHub
kou merged PR #694: URL: https://github.com/apache/arrow-java/pull/694 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected]

Re: [I] [C++][Dataset] Parquet schema lost on dataset write [arrow]

2025-03-31 Thread via GitHub
wgtmac commented on issue #45969: URL: https://github.com/apache/arrow/issues/45969#issuecomment-2765615596 > expose an additional parameter to pass the full Parquet schema I'm in favor of this because it is simple. However, these logical types have different issues out there:

Re: [PR] fix(go/adbc/driver/snowflake): try to suppress stray logs [arrow-adbc]

2025-03-31 Thread via GitHub
lidavidm commented on PR #2608: URL: https://github.com/apache/arrow-adbc/pull/2608#issuecomment-2766032409 Looks like the snowflake PR merged, I will try and bump (temporarily) to see if it does fix the issue here -- This is an automated message from the Apache Git Service. To respond to

[PR] Print row, data present, expected type, and row number in error messages for arrow-csv [arrow-rs]

2025-03-31 Thread via GitHub
psiayn opened a new pull request, #7361: URL: https://github.com/apache/arrow-rs/pull/7361 # Which issue does this PR close? Closes #7344 . # Rationale for this change # What changes are included in this PR? This change allows for better er

Re: [PR] GH-45732: [C++][Compute] Accept more pivot key types [arrow]

2025-03-31 Thread via GitHub
kszucs commented on code in PR #45945: URL: https://github.com/apache/arrow/pull/45945#discussion_r2020612366 ## cpp/src/arrow/compute/kernels/pivot_internal.cc: ## @@ -18,110 +18,137 @@ #include "arrow/compute/kernels/pivot_internal.h" #include +#include +#include +#in

Re: [PR] GH-45732: [C++][Compute] Accept more pivot key types [arrow]

2025-03-31 Thread via GitHub
pitrou commented on code in PR #45945: URL: https://github.com/apache/arrow/pull/45945#discussion_r2021004395 ## cpp/src/arrow/acero/hash_aggregate_test.cc: ## @@ -4749,6 +4773,21 @@ TEST_P(GroupBy, PivotDuplicateKeys) { RunPivot(key_type, value_type, options, table_json)

Re: [PR] GH-45732: [C++][Compute] Accept more pivot key types [arrow]

2025-03-31 Thread via GitHub
pitrou commented on code in PR #45945: URL: https://github.com/apache/arrow/pull/45945#discussion_r2021003973 ## cpp/src/arrow/compute/exec.h: ## @@ -276,7 +276,7 @@ struct ExecValue { ArraySpan array = {}; const Scalar* scalar = NULLPTR; - ExecValue(Scalar* scalar) //

Re: [PR] GH-45853: [C++][Dev] Fix Meson compilation issues in Docker builds [arrow]

2025-03-31 Thread via GitHub
pitrou commented on code in PR #45858: URL: https://github.com/apache/arrow/pull/45858#discussion_r2021030294 ## dev/tasks/tasks.yml: ## @@ -815,7 +815,10 @@ tasks: ci: github template: docker-tests/github.linux.yml params: - flags: -e ARROW_USE_MESON=ON +

Re: [I] Pushdown predictions to Parquet in-memory row group fetches [arrow-rs]

2025-03-31 Thread via GitHub
ethe commented on issue #7348: URL: https://github.com/apache/arrow-rs/issues/7348#issuecomment-2766208486 Considering the APIs, how about directly implementing `datafusion_catalog::TableProvider` for `ParquetRecordBatchStream`? It is too complicated if there are three kinds of row filter m

Re: [PR] GH-45949: [R] Fix CRAN warnings for 19.0.1 about compiled code [arrow]

2025-03-31 Thread via GitHub
github-actions[bot] commented on PR #45951: URL: https://github.com/apache/arrow/pull/45951#issuecomment-2766286153 Revision: ec5161be763e506780e0eb01867a21106e2eabe9 Submitted crossbow builds: [ursacomputing/crossbow @ actions-50121181c0](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-45978: [C++] Bump bundled mimalloc version [arrow]

2025-03-31 Thread via GitHub
github-actions[bot] commented on PR #45979: URL: https://github.com/apache/arrow/pull/45979#issuecomment-2766349689 :warning: GitHub issue #45978 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] MINOR: [C++] Bump bundled mimalloc version [arrow]

2025-03-31 Thread via GitHub
pitrou commented on PR #44941: URL: https://github.com/apache/arrow/pull/44941#issuecomment-2766349202 Closing in favor of https://github.com/apache/arrow/pull/45979 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] MINOR: [C++] Bump bundled mimalloc version [arrow]

2025-03-31 Thread via GitHub
pitrou closed pull request #44941: MINOR: [C++] Bump bundled mimalloc version URL: https://github.com/apache/arrow/pull/44941 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-45978: [C++] Bump bundled mimalloc version [arrow]

2025-03-31 Thread via GitHub
pitrou commented on PR #45979: URL: https://github.com/apache/arrow/pull/45979#issuecomment-2766350118 @github-actions crossbow submit -g cpp -g python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] GH-45614: [C++] Use Boost's CMake packages instead of FindBoost.cmake in CMake [arrow]

2025-03-31 Thread via GitHub
github-actions[bot] commented on PR #45623: URL: https://github.com/apache/arrow/pull/45623#issuecomment-2766269642 Revision: aac1fccd33e0c63ee93e509e539358e3e327c8b7 Submitted crossbow builds: [ursacomputing/crossbow @ actions-c37ec9d34c](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-45614: [C++] Use Boost's CMake packages instead of FindBoost.cmake in CMake [arrow]

2025-03-31 Thread via GitHub
pitrou commented on PR #45623: URL: https://github.com/apache/arrow/pull/45623#issuecomment-2766355268 AppVeyor doesn't fail anymore because of Boost, but it now fails later on mimalloc. That will hopefully be fixed by https://github.com/apache/arrow/pull/45979 -- This is an automated me

Re: [PR] ARROW-17026: [C++] Add RLE benchmarks [arrow]

2025-03-31 Thread via GitHub
kszucs commented on PR #13653: URL: https://github.com/apache/arrow/pull/13653#issuecomment-2766266296 @zagto shall we keep this PR open? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] ARROW-12526: Pre-generating pyarrow.compute and creating a docstring additions system for pyarrow functions [arrow]

2025-03-31 Thread via GitHub
pitrou commented on PR #13126: URL: https://github.com/apache/arrow/pull/13126#issuecomment-2766362265 Perhaps @AlenkaF or someone else would be interested in reviving this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] GH-45853: [C++][Dev] Fix Meson compilation issues in Docker builds [arrow]

2025-03-31 Thread via GitHub
WillAyd commented on code in PR #45858: URL: https://github.com/apache/arrow/pull/45858#discussion_r2021057784 ## ci/scripts/cpp_build.sh: ## @@ -118,12 +118,36 @@ if [ "${ARROW_USE_MESON:-OFF}" = "ON" ]; then fi } + ORIGINAL_CC="${CC}" + if [ -n "${CC}" ]; then +

Re: [PR] GH-45614: [C++] Use Boost's CMake packages instead of FindBoost.cmake in CMake [arrow]

2025-03-31 Thread via GitHub
github-actions[bot] commented on PR #45623: URL: https://github.com/apache/arrow/pull/45623#issuecomment-2766269419 Revision: aac1fccd33e0c63ee93e509e539358e3e327c8b7 Submitted crossbow builds: [ursacomputing/crossbow @ actions-1f84f87ddc](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-45953: [C++] Use lock to fix atomic bug in ReadaheadGenerator [arrow]

2025-03-31 Thread via GitHub
pitrou commented on code in PR #45954: URL: https://github.com/apache/arrow/pull/45954#discussion_r2021141798 ## cpp/src/arrow/util/async_generator.h: ## @@ -772,20 +782,30 @@ class ReadaheadGenerator { Future operator()() { if (state_->readahead_queue.empty()) {

Re: [PR] Parquet: Support reading Parquet metadata via suffix range requests [arrow-rs]

2025-03-31 Thread via GitHub
alamb commented on PR #7334: URL: https://github.com/apache/arrow-rs/pull/7334#issuecomment-2766372555 Thanks again @kylebarron ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Read Parquet metadata via suffix requests [arrow-rs]

2025-03-31 Thread via GitHub
alamb closed issue #5979: Read Parquet metadata via suffix requests URL: https://github.com/apache/arrow-rs/issues/5979 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] GH-45961: [Release][Docs] Upload generated docs to GitHub Releases not apache.jfrog.io [arrow]

2025-03-31 Thread via GitHub
pitrou commented on PR #45963: URL: https://github.com/apache/arrow/pull/45963#issuecomment-2766399569 Do we actually need to upload the docs somewhere? I had no idea we were doing this. Are people downloading them? -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] GH-45953: [C++] Use lock to fix atomic bug in ReadaheadGenerator [arrow]

2025-03-31 Thread via GitHub
mapleFU commented on PR #45954: URL: https://github.com/apache/arrow/pull/45954#issuecomment-2766407754 Comment fixed, I'll merge this after all ci passes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] PoC: Add Predicate Pushdown to Parquet Reader for Optimized Query Performance [arrow-rs]

2025-03-31 Thread via GitHub
alamb commented on code in PR #7360: URL: https://github.com/apache/arrow-rs/pull/7360#discussion_r2021149241 ## parquet/src/arrow/arrow_reader/mod.rs: ## @@ -646,6 +716,125 @@ impl ParquetRecordBatchReaderBuilder { apply_range(selection, reader.num_rows(), self.of

Re: [PR] GH-45750: [C++][Python][Parquet] Implement Content-Defined Chunking for the Parquet writer [arrow]

2025-03-31 Thread via GitHub
mapleFU commented on PR #45360: URL: https://github.com/apache/arrow/pull/45360#issuecomment-2766409937 I'm a little busy today and will take a careful round tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] ARROW-12526: Pre-generating pyarrow.compute and creating a docstring additions system for pyarrow functions [arrow]

2025-03-31 Thread via GitHub
kszucs commented on PR #13126: URL: https://github.com/apache/arrow/pull/13126#issuecomment-2766261889 @jorisvandenbossche shall we keep this PR open? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Pushdown predictions to Parquet in-memory row group fetches [arrow-rs]

2025-03-31 Thread via GitHub
alamb commented on issue #7348: URL: https://github.com/apache/arrow-rs/issues/7348#issuecomment-2766416437 > **Are there alternatives?** > Probably there aren't, whether changing row selection or row filter to support value matching will brings breaking change. I may not understan

Re: [PR] Add documentation and examples for pretty printing, make `pretty_format_columns_with_options` pub [arrow-rs]

2025-03-31 Thread via GitHub
alamb merged PR #7346: URL: https://github.com/apache/arrow-rs/pull/7346 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected]

Re: [PR] GH-692: Preserve nullability information while transfering DecimalVector and Decimal256Vector [arrow-java]

2025-03-31 Thread via GitHub
jbonofre commented on code in PR #693: URL: https://github.com/apache/arrow-java/pull/693#discussion_r2021087618 ## vector/src/main/java/org/apache/arrow/vector/Decimal256Vector.java: ## @@ -566,9 +566,7 @@ private class TransferImpl implements TransferPair { Decimal256Vect

Re: [I] StringArrayView(Utf8View) slower cases compare to StringArray(Utf8) [arrow-rs]

2025-03-31 Thread via GitHub
XiangpengHao commented on issue #7350: URL: https://github.com/apache/arrow-rs/issues/7350#issuecomment-2766461872 > we add new new ByteView to support 8bytes prefix I think Arrow spec says we need to do 4 bytes prefix: https://arrow.apache.org/docs/format/Columnar.html#variable-size-

Re: [I] StringArrayView(Utf8View) slower cases compare to StringArray(Utf8) [arrow-rs]

2025-03-31 Thread via GitHub
alamb commented on issue #7350: URL: https://github.com/apache/arrow-rs/issues/7350#issuecomment-2766464995 I do think theoretically StringArray is likely to be faster than StringViewArray for larger strings in many cases as it is more efficient (it has fewer indirections) > Another

Re: [PR] GH-36411: [C++][Python] Use meson-python for PyArrow build system [arrow]

2025-03-31 Thread via GitHub
github-actions[bot] commented on PR #45854: URL: https://github.com/apache/arrow/pull/45854#issuecomment-2766480274 Revision: 8372b0e91d2d8f7eb6c4d449d369d64fe2eccd30 Submitted crossbow builds: [ursacomputing/crossbow @ actions-2e5da2b119](https://github.com/ursacomputing/crossbow/bra

[I] [EPIC] Improve performance of parquet filter pushdown [arrow-rs]

2025-03-31 Thread via GitHub
alamb opened a new issue, #7362: URL: https://github.com/apache/arrow-rs/issues/7362 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** We are trying to speed up [`RowFilter`](https://docs.rs/parquet/latest/parquet/arrow/arrow_re

Re: [I] [C++][Dataset] Parquet schema lost on dataset write [arrow]

2025-03-31 Thread via GitHub
mapleFU commented on issue #45969: URL: https://github.com/apache/arrow/issues/45969#issuecomment-2765541325 Emmm can this be solved by direct mapping arrow type / extension type to specific Parquet Logical Type (like json and uuid) ? I'm ok to add the api to pass the full parquet sch

Re: [I] [C++][Dataset] Parquet schema lost on dataset write [arrow]

2025-03-31 Thread via GitHub
pitrou commented on issue #45969: URL: https://github.com/apache/arrow/issues/45969#issuecomment-2765508966 This would probably need to be added to the Arrow-Parquet C++ APIs first. I see two possible kinds of API: 1) expose an additional parameter to pass the full Parquet schema (we woul

Re: [I] [Ruby] Unify test for table in raw_records and each_raw_record [arrow]

2025-03-31 Thread via GitHub
otegami commented on issue #45897: URL: https://github.com/apache/arrow/issues/45897#issuecomment-2765636241 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] GH-45732: [C++][Compute] Accept more pivot key types [arrow]

2025-03-31 Thread via GitHub
kszucs commented on PR #45945: URL: https://github.com/apache/arrow/pull/45945#issuecomment-2765664670 > By the way, this PR (the move to a `Grouper`-based pivot implementation) results in a ~15% decrease in `hash_pivot_wider` performance in the simplistic benchmarks I posted in #45741 .

[PR] GH-45897: [Ruby] Unify test for table in raw_records and each_raw_record [arrow]

2025-03-31 Thread via GitHub
otegami opened a new pull request, #45977: URL: https://github.com/apache/arrow/pull/45977 ### Rationale for this change The PR reduces duplicated test cases and ensures that both `raw_records` and `each_raw_record` behave consistently by extracting their common test cases. - `Arro

Re: [PR] Bump `object_store` to `0.12.0` [arrow-rs]

2025-03-31 Thread via GitHub
mbrobbel commented on PR #7328: URL: https://github.com/apache/arrow-rs/pull/7328#issuecomment-2765772331 > Is this PR ready to go? Looks like > > * [Change Parquet API interaction for u64 #7252](https://github.com/apache/arrow-rs/pull/7252) > has feedback but no action f

Re: [PR] GH-36411: [C++][Python] Use meson-python for PyArrow build system [arrow]

2025-03-31 Thread via GitHub
WillAyd commented on code in PR #45854: URL: https://github.com/apache/arrow/pull/45854#discussion_r2021264915 ## ci/scripts/python_build.sh: ## @@ -89,7 +136,37 @@ pushd ${python_build_dir} # on Debian/Ubuntu (ARROW-15243). # - Cannot use build isolation as we want to use s

Re: [PR] GH-36411: [C++][Python] Use meson-python for PyArrow build system [arrow]

2025-03-31 Thread via GitHub
pitrou commented on code in PR #45854: URL: https://github.com/apache/arrow/pull/45854#discussion_r2021253688 ## dev/release/02-source-test.rb: ## @@ -84,7 +84,12 @@ def test_csharp_git_commit_information def test_python_version source Dir.chdir("#{@tag_name_no_rc}/

  1   2   >