[PR] fix(csharp/src/Drivers/BigQuery): improve selective handling of cancellation exception [arrow-adbc]

2025-10-24 Thread via GitHub
birschick-bq opened a new pull request, #3615: URL: https://github.com/apache/arrow-adbc/pull/3615 After adding cancellation functionality to BigQuery statements, there is a report that `task was cancelled` messages are now appearing. This is likely due to the change where the retry c

Re: [PR] GH-32609: [Python] Add type annotations to PyArrow [arrow]

2025-10-24 Thread via GitHub
rok commented on PR #47609: URL: https://github.com/apache/arrow/pull/47609#issuecomment-3445160090 @dangotbanned I got pyright, mypy and ty passing in CI with the following settings: https://github.com/apache/arrow/blob/c9608d2270bf0230a8a6f270246c655d11a543f2/python/pyproject.toml#L97-

Re: [PR] optimization: IPC: pass FieldNodes around by value instead of reference [arrow-go]

2025-10-24 Thread via GitHub
pixelherodev commented on PR #543: URL: https://github.com/apache/arrow-go/pull/543#issuecomment-3444240808 > > I'll have to send a patch to the generation even further upstream, I'm guessing? 😓 > > yup. Did you re-generate the flatbuffer code for this? I tested it with a patch

Re: [PR] Add `FilterPredicate::filter_record_batch` [arrow-rs]

2025-10-24 Thread via GitHub
alamb merged PR #8693: URL: https://github.com/apache/arrow-rs/pull/8693 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected]

Re: [PR] Add `FilterPredicate::filter_record_batch` [arrow-rs]

2025-10-24 Thread via GitHub
alamb commented on PR #8693: URL: https://github.com/apache/arrow-rs/pull/8693#issuecomment-3444245490 Thank you @pepijnve -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] pyarrow.fs.FileSystem.from_uri does not refer to HDFS core-site.xml config file when resolving namenode for HDFS URL [arrow]

2025-10-24 Thread via GitHub
kszucs commented on issue #42050: URL: https://github.com/apache/arrow/issues/42050#issuecomment-3444160312 @pstrzelczak can you try to use "default" as the hostname component of the URI? According to https://github.com/apache/arrow/issues/47560 it should be handled by libhdfs. -- This i

Re: [I] Allow `FilterPredicate` instances to be reused for RecordBatches [arrow-rs]

2025-10-24 Thread via GitHub
alamb closed issue #8692: Allow `FilterPredicate` instances to be reused for RecordBatches URL: https://github.com/apache/arrow-rs/issues/8692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] optimization: IPC: pass FieldNodes around by value instead of reference [arrow-go]

2025-10-24 Thread via GitHub
pixelherodev commented on PR #543: URL: https://github.com/apache/arrow-go/pull/543#issuecomment-3444256139 ...ah. Regenerated files are missing the licenses. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] feat: async and multi-result set APIs WIP [arrow-adbc]

2025-10-24 Thread via GitHub
zeroshade commented on code in PR #3607: URL: https://github.com/apache/arrow-adbc/pull/3607#discussion_r2461762081 ## c/include/arrow-adbc/adbc.h: ## @@ -1665,6 +1869,49 @@ AdbcStatusCode AdbcConnectionGetObjects(struct AdbcConnection* connection, int d

Re: [PR] feat: async and multi-result set APIs WIP [arrow-adbc]

2025-10-24 Thread via GitHub
zeroshade commented on PR #3607: URL: https://github.com/apache/arrow-adbc/pull/3607#issuecomment-3444635805 > Awesome! Probably an async wrapper around a sync array stream in nanoarrow would help fill some of these in. Wouldn't it make more sense to just have nanoarrow implement the

Re: [PR] GH-46592: [CI][Dev][R] Add Air to pre-commit [arrow]

2025-10-24 Thread via GitHub
jonkeane commented on PR #47423: URL: https://github.com/apache/arrow/pull/47423#issuecomment-345134 > @jonkeane Went down a rabbit hole with this one, but it appears that right now, air doesn't support .Rmd files, so our vignettes. My take is that we update these so infrequently and ai

Re: [PR] feat(go/adbc/driver/bigquery): add `BIGQUERY:type` field metadata [arrow-adbc]

2025-10-24 Thread via GitHub
felipecrv commented on code in PR #3604: URL: https://github.com/apache/arrow-adbc/pull/3604#discussion_r2461681010 ## go/adbc/driver/bigquery/connection.go: ## @@ -785,6 +785,8 @@ func buildField(schema *bigquery.FieldSchema, level uint) (arrow.Field, error) { field.Nu

Re: [I] replace macos-13 github runner [arrow-java]

2025-10-24 Thread via GitHub
lriggs commented on issue #869: URL: https://github.com/apache/arrow-java/issues/869#issuecomment-3444502536 I've been looking into this. Can anyone comment on how the build/release process works? I'm guessing the rc.yml workflow is used (and this change would be made there) but I'm not cer

Re: [PR] Casting support for RunEndEncoded arrays [arrow-rs]

2025-10-24 Thread via GitHub
vegarsti commented on code in PR #8589: URL: https://github.com/apache/arrow-rs/pull/8589#discussion_r2461831132 ## arrow-cast/src/cast/run_array.rs: ## @@ -0,0 +1,262 @@ +use crate::cast::*; +use arrow_ord::partition::partition; + +/// Attempts to cast a Run-End Encoded array t

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
xhochy commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3444710611 > can any of you give my user (raulcd) access to the test.pypi pyarrow project? Done. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] MINOR: Upgrade to Apache POM 35 and identify fixes needed to have CI happy [arrow-java]

2025-10-24 Thread via GitHub
kou commented on PR #865: URL: https://github.com/apache/arrow-java/pull/865#issuecomment-3441568071 OK. @jbonofre Could you create a new issue for this and update the PR title/description for the latest changes? -- This is an automated message from the Apache Git Service. To respo

Re: [PR] MINOR: Upgrade to Apache POM 35 and identify fixes needed to have CI happy [arrow-java]

2025-10-24 Thread via GitHub
lidavidm commented on PR #865: URL: https://github.com/apache/arrow-java/pull/865#issuecomment-3441451684 I can replicate the crash locally. I also see this: ``` java.lang.RuntimeException: No function registered with name: make_struct ``` Not sure what's happening here. L

Re: [PR] MINOR: Upgrade to Apache POM 35 and identify fixes needed to have CI happy [arrow-java]

2025-10-24 Thread via GitHub
lidavidm commented on PR #865: URL: https://github.com/apache/arrow-java/pull/865#issuecomment-3441480832 I don't think I have the time to investigate this. I'm at least willing to disable the tests and declare that dataset needs maintainers in order to continue... -- This is an automate

Re: [PR] MINOR: Upgrade to Apache POM 35 and identify fixes needed to have CI happy [arrow-java]

2025-10-24 Thread via GitHub
lidavidm commented on PR #865: URL: https://github.com/apache/arrow-java/pull/865#issuecomment-3441474877 I can see the same in CI. I thought compute functions were automatically registered? Or did something about that change over time -- This is an automated message from the Apac

Re: [PR] MINOR: Upgrade to Apache POM 35 and identify fixes needed to have CI happy [arrow-java]

2025-10-24 Thread via GitHub
kou commented on PR #865: URL: https://github.com/apache/arrow-java/pull/865#issuecomment-3441492434 https://github.com/apache/arrow/pull/46261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] MINOR: Upgrade to Apache POM 35 and identify fixes needed to have CI happy [arrow-java]

2025-10-24 Thread via GitHub
kou commented on PR #865: URL: https://github.com/apache/arrow-java/pull/865#issuecomment-3441487587 Ah, we need to call `arrow::compute::Initialize()` explicitly with recent Apache Arrow C++. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] MINOR: Upgrade to Apache POM 35 and identify fixes needed to have CI happy [arrow-java]

2025-10-24 Thread via GitHub
lidavidm commented on PR #865: URL: https://github.com/apache/arrow-java/pull/865#issuecomment-3441521776 Ok, let me try that, thanks :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] WIP: [Release] Verify release-22.0.0-rc1 [arrow]

2025-10-24 Thread via GitHub
raulcd merged PR #47865: URL: https://github.com/apache/arrow/pull/47865 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected]

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
pitrou commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442613073 Let's simply remove it from https://github.com/apache/arrow/blob/main/python/setup.cfg#L18-L21 -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] fix: liberal parsing of zero scale decimals [arrow-rs]

2025-10-24 Thread via GitHub
martin-g commented on code in PR #8700: URL: https://github.com/apache/arrow-rs/pull/8700#discussion_r2459835602 ## arrow-cast/src/parse.rs: ## @@ -2752,6 +2761,23 @@ mod tests { let result = parse_decimal::(s, 76, scale); assert_eq!(i, result.unwrap())

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
pitrou commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442636547 For reference (I don't expect a positive answer there, but wanted to surface the issue): https://discuss.python.org/t/expressing-project-vs-distribution-licenses-post-pep-639/90314

Re: [I] [C++][Parquet] Make compression adaptive with V2 data pages [arrow]

2025-10-24 Thread via GitHub
pitrou commented on issue #47752: URL: https://github.com/apache/arrow/issues/47752#issuecomment-3442646521 @harshkumar-2005 Perhaps we should use the same kind as vocabulary as in the IPC options: https://github.com/apache/arrow/blob/e044907842c65b6cc011447cd96d3af1f2cdcce6/cpp/src/arro

Re: [PR] feat(go/adbc/driver/bigquery): Add option to link failed jobs [arrow-adbc]

2025-10-24 Thread via GitHub
felipecrv commented on PR #3614: URL: https://github.com/apache/arrow-adbc/pull/3614#issuecomment-3443632418 > Is there any reason to make this an option vs just always including it in the error? Verbosity of the error message. -- This is an automated message from the Apache Git Se

Re: [PR] check bit width to avoid panic in DeltaBitPackDecoder [arrow-rs]

2025-10-24 Thread via GitHub
etseidl commented on code in PR #8688: URL: https://github.com/apache/arrow-rs/pull/8688#discussion_r2460937794 ## parquet/src/encodings/decoding.rs: ## @@ -2091,4 +2106,45 @@ mod tests { v } } + +#[test] +fn test_delta_bit_packed_invalid_bit_w

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
Schamschula commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3443646963 > > This is blocking my update of apache-arrow and py-pyarrow for MacPorts. > > If you really require the source distribution, can't you try installing from the GitHub rele

Re: [PR] GH-47927: [Release] Fix APT repository metadata generation with new repository [arrow]

2025-10-24 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #47928: URL: https://github.com/apache/arrow/pull/47928#issuecomment-3443654081 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 0112b27aee84e3f037899660041aceb6ef4e0026. There were no

Re: [I] Cannot read Parquet files with multiple dictionary pages per column chunk [arrow-go]

2025-10-24 Thread via GitHub
zeroshade commented on issue #546: URL: https://github.com/apache/arrow-go/issues/546#issuecomment-3443686856 According to the Parquet spec, as far as I'm aware, a column chunk is only allowed to have a single dictionary page. Which means that containing multiple dictionary pages in a singl

Re: [PR] optimization: IPC: pass FieldNodes around by value instead of reference [arrow-go]

2025-10-24 Thread via GitHub
zeroshade commented on PR #543: URL: https://github.com/apache/arrow-go/pull/543#issuecomment-3443691774 > I'll have to send a patch to the generation even further upstream, I'm guessing? 😓 yup. Did you re-generate the flatbuffer code for this? -- This is an automated message from

Re: [I] [CI][Release][R] r-binary-packages job fails on the release candidate branch for 22.0.0 [arrow]

2025-10-24 Thread via GitHub
eitsupi commented on issue #47821: URL: https://github.com/apache/arrow/issues/47821#issuecomment-3443692469 I'm not sure if this is related to the issue, but the binary artifacts of r-libarrow are present in 22.0.0 RC0 but not in RC1 and the 22.0.0 release version. https://github.com/ap

Re: [PR] optimization: IPC: pass FieldNodes around by value instead of reference [arrow-go]

2025-10-24 Thread via GitHub
pixelherodev commented on PR #543: URL: https://github.com/apache/arrow-go/pull/543#issuecomment-3444269126 Before: ``` BenchmarkIPC/Writer/codec=plain-24203998 7155 ns/op8160 B/op 91 allocs/op BenchmarkIPC/Reader/codec=plain-24171043

Re: [PR] optimization: IPC: pass FieldNodes around by value instead of reference [arrow-go]

2025-10-24 Thread via GitHub
pixelherodev commented on PR #543: URL: https://github.com/apache/arrow-go/pull/543#issuecomment-3444274290 I also haven't been able to look into the arrow/flight failure, as that test has never passed for me locally anyways... -- This is an automated message from the Apache Git Service.

Re: [I] [C++] Adopt alternative safe math approach [arrow]

2025-10-24 Thread via GitHub
WillAyd commented on issue #47926: URL: https://github.com/apache/arrow/issues/47926#issuecomment-3444350875 Another option may be https://github.com/foonathan/type_safe - I have no personal experience with it but noticed it in Meson's WrapDB -- This is an automated message from the Apach

Re: [PR] optimization: IPC: pass FieldNodes around by value instead of reference [arrow-go]

2025-10-24 Thread via GitHub
pixelherodev commented on PR #543: URL: https://github.com/apache/arrow-go/pull/543#issuecomment-3444388969 I have more optimizations locally, but unfortunately most of them deliberately break functionality (e.g. arrowflight). Going to take a look at profiles of the benchmarks and try to im

Re: [PR] GH-47899: [Dev] Add checklist to PR template [arrow]

2025-10-24 Thread via GitHub
amoeba commented on code in PR #47916: URL: https://github.com/apache/arrow/pull/47916#discussion_r2461621876 ## .github/pull_request_template.md: ## @@ -15,6 +15,16 @@ Please remove this line and the above text before creating your pull request. ### Are there any user-facin

Re: [PR] feat: async and multi-result set APIs WIP [arrow-adbc]

2025-10-24 Thread via GitHub
paleolimbot commented on code in PR #3607: URL: https://github.com/apache/arrow-adbc/pull/3607#discussion_r2461797450 ## c/include/arrow-adbc/adbc.h: ## @@ -1665,6 +1869,49 @@ AdbcStatusCode AdbcConnectionGetObjects(struct AdbcConnection* connection, int d

Re: [PR] feat: async and multi-result set APIs WIP [arrow-adbc]

2025-10-24 Thread via GitHub
zeroshade commented on PR #3607: URL: https://github.com/apache/arrow-adbc/pull/3607#issuecomment-3444682347 > Prepare may have to do I/O, hence it should have an async variant. Added StatementPrepareAsync -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Casting support for RunEndEncoded arrays [arrow-rs]

2025-10-24 Thread via GitHub
vegarsti commented on code in PR #8589: URL: https://github.com/apache/arrow-rs/pull/8589#discussion_r2461823560 ## arrow-cast/src/cast/run_array.rs: ## @@ -0,0 +1,262 @@ +use crate::cast::*; +use arrow_ord::partition::partition; + +/// Attempts to cast a Run-End Encoded array t

Re: [PR] Casting support for RunEndEncoded arrays [arrow-rs]

2025-10-24 Thread via GitHub
vegarsti commented on code in PR #8589: URL: https://github.com/apache/arrow-rs/pull/8589#discussion_r2461831132 ## arrow-cast/src/cast/run_array.rs: ## @@ -0,0 +1,262 @@ +use crate::cast::*; +use arrow_ord::partition::partition; + +/// Attempts to cast a Run-End Encoded array t

Re: [PR] add the option to disable the part of threading in `Arrow.Table` that leads to catastrophic negative scaling [arrow-julia]

2025-10-24 Thread via GitHub
quinnj commented on PR #568: URL: https://github.com/apache/arrow-julia/pull/568#issuecomment-3444815989 I think https://github.com/apache/arrow-julia/pull/570 should fix the original issue; let's review/test/merge that and then cut the release. -- This is an automated message from the Ap

Re: [PR] Fix poor performance of table reading when many record batches are involved [arrow-julia]

2025-10-24 Thread via GitHub
KristofferC commented on PR #570: URL: https://github.com/apache/arrow-julia/pull/570#issuecomment-3444841528 Before: ``` ❯ julia --project mwe.jl 0.140257 seconds (2.56 M allocations: 130.583 MiB, 16.17% gc time ❯ julia --project --threads=auto mwe.jl 32.504022 seco

Re: [PR] add the option to disable the part of threading in `Arrow.Table` that leads to catastrophic negative scaling [arrow-julia]

2025-10-24 Thread via GitHub
KristofferC closed pull request #568: add the option to disable the part of threading in `Arrow.Table` that leads to catastrophic negative scaling URL: https://github.com/apache/arrow-julia/pull/568 -- This is an automated message from the Apache Git Service. To respond to the message, please

[PR] Fix poor performance of table reading when many record batches are involved [arrow-julia]

2025-10-24 Thread via GitHub
quinnj opened a new pull request, #570: URL: https://github.com/apache/arrow-julia/pull/570 Fixes #528. Alternative to #568 cc: @KristofferC -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Change some panics to errors in parquet decoder [arrow-rs]

2025-10-24 Thread via GitHub
alamb commented on PR #8602: URL: https://github.com/apache/arrow-rs/pull/8602#issuecomment-3444905857 > I think I want to start on a parquet end-to-end benchmark of some sort. The arrow_reader bench is _so_ sensitive. On my workstation I was seeing the BinaryViewArray bench taking about 10

Re: [PR] Change some panics to errors in parquet decoder [arrow-rs]

2025-10-24 Thread via GitHub
etseidl commented on PR #8602: URL: https://github.com/apache/arrow-rs/pull/8602#issuecomment-3444901295 I think I want to start on a parquet end-to-end benchmark of some sort. The arrow_reader bench is *so* sensitive. On my workstation I was seeing the BinaryViewArray bench taking about 10

Re: [PR] Change some panics to errors in parquet decoder [arrow-rs]

2025-10-24 Thread via GitHub
etseidl commented on PR #8602: URL: https://github.com/apache/arrow-rs/pull/8602#issuecomment-3444912839 > That would be amazing - I agree the benchmarks could definitely improved. Shall I file a ticket ? That's ok, I can file it. -- This is an automated message from the Apache Git

Re: [PR] feat(arrow/extensions): add support for geoarrow.point [arrow-go]

2025-10-24 Thread via GitHub
zeroshade commented on code in PR #545: URL: https://github.com/apache/arrow-go/pull/545#discussion_r2461942047 ## arrow/extensions/geoarrow.go: ## @@ -0,0 +1,186 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See t

Re: [PR] GH-47861: [Python] reduce memory usage when using to_pandas() with many extension arrays columns [arrow]

2025-10-24 Thread via GitHub
Pear0 commented on PR #47860: URL: https://github.com/apache/arrow/pull/47860#issuecomment-3444980623 @pitrou btw I accepted your comment suggestion. Could you take another look at this PR when you get a chance? -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] MINOR: Upgrade to Apache POM 35 and identify fixes needed to have CI happy [arrow-java]

2025-10-24 Thread via GitHub
lidavidm commented on PR #865: URL: https://github.com/apache/arrow-java/pull/865#issuecomment-3441529308 IMO, let's merge this PR as-is though? We can start unblocking other PRs and this has been around for a while -- This is an automated message from the Apache Git Service. To respond t

[PR] WIP: [Dataset] Initialize compute module [arrow-java]

2025-10-24 Thread via GitHub
lidavidm opened a new pull request, #893: URL: https://github.com/apache/arrow-java/pull/893 ## What's Changed TODO make an issue for this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] [Python][Packaging] Support Python 3.14 and upload wheels [arrow]

2025-10-24 Thread via GitHub
raulcd commented on issue #47438: URL: https://github.com/apache/arrow/issues/47438#issuecomment-3442340930 After the vote was closed successfully for Arrow 22.0.0 release I've uploaded the wheels for Arrow 22.0.0 to PyPI, including Python 3.14 wheels. I'll close this issue now. -- This

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
AlenkaF commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442737286 Great idea! +1 from me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] perf: override `count`, `nth`, `nth_back`, `last` and `max` for BitIterator [arrow-rs]

2025-10-24 Thread via GitHub
martin-g commented on code in PR #8696: URL: https://github.com/apache/arrow-rs/pull/8696#discussion_r2460028202 ## arrow-buffer/src/util/bit_iterator.rs: ## @@ -86,6 +152,27 @@ impl DoubleEndedIterator for BitIterator<'_> { let v = unsafe { get_bit_raw(self.buffer.as_p

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
raulcd commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442749989 > This sounds good to me too. Can you post the diff of setup.cfg change here for the record? ```diff $ diff pyarrow-22.0.0 old/pyarrow-22.0.0/ Common subdirectories: pyarr

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
pitrou commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442677626 I'm certainly fine with it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
raulcd commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442704141 This would be the new tar.gz generated after: ``` tar zxvf pyarrow-22.0.0.tar.gz vim pyarrow-22.0.0/setup.cfg #Remove the metadata lines on setup.cfg mv pyarrow-22.0.0.tar.g

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
rok commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442721622 This sounds good to me too. Can you post the diff of setup.cfg change here for the record? -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] perf: override `count`, `nth`, `nth_back`, `last` and `max` for BitIterator [arrow-rs]

2025-10-24 Thread via GitHub
martin-g commented on code in PR #8696: URL: https://github.com/apache/arrow-rs/pull/8696#discussion_r2460005333 ## arrow-buffer/src/util/bit_iterator.rs: ## @@ -71,6 +72,71 @@ impl Iterator for BitIterator<'_> { let remaining_bits = self.end_offset - self.current_offse

Re: [PR] perf: override `count`, `nth`, `nth_back`, `last` and `max` for BitIterator [arrow-rs]

2025-10-24 Thread via GitHub
martin-g commented on code in PR #8696: URL: https://github.com/apache/arrow-rs/pull/8696#discussion_r2460005333 ## arrow-buffer/src/util/bit_iterator.rs: ## @@ -71,6 +72,71 @@ impl Iterator for BitIterator<'_> { let remaining_bits = self.end_offset - self.current_offse

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
raulcd commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442668360 > Let's simply remove it from https://github.com/apache/arrow/blob/main/python/setup.cfg#L18-L21 Are you suggesting me to: - Untar the https://github.com/apache/arrow/releas

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
raulcd commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442760951 If I use something slightly fancier like diffoscope to validate binaries reproducibility I get the file timestamp update, the filesize change and the actual modification: ``` $

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
cdce8p commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442765688 AFAIK setuptools can't include files from the parent directory. So `../LICENSE.txt` won't really work. It would need to be copied inside the `arrow/python` folder before / during the

Re: [PR] fix: liberal parsing of zero scale decimals [arrow-rs]

2025-10-24 Thread via GitHub
gruuya commented on code in PR #8700: URL: https://github.com/apache/arrow-rs/pull/8700#discussion_r2460113096 ## arrow-cast/src/parse.rs: ## @@ -963,6 +963,14 @@ pub fn parse_decimal( } if !is_e_notation { +if scale == 0 && fractionals > 0 { +//

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
raulcd commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442844902 Given we are in agreement I went ahead an tried to submit the new tar.gz, and I get the same failure. PyPI seems to be enforcing a `LICENSE.txt` file even if the license metadata isn'

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
pitrou commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442863040 @raulcd You have to rebuild the sdist after the `setup.cfg` changes. *Or* you edit `pyarrow-22.0.0/PKG-INFO` and `pyarrow-22.0.0/pyarrow.egg-info/PKG-INFO` manually. -- This is an

Re: [PR] Add `FilterPredicate::filter_record_batch` [arrow-rs]

2025-10-24 Thread via GitHub
martin-g commented on code in PR #8693: URL: https://github.com/apache/arrow-rs/pull/8693#discussion_r2460148363 ## arrow-select/src/filter.rs: ## @@ -173,20 +173,17 @@ pub fn filter_record_batch( predicate: &BooleanArray, ) -> Result { let mut filter_builder = Filter

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
raulcd commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442909582 ok, editing `PKG-INFO` fixed the issue. Final diffoscope: ``` $ diffoscope old/pyarrow-22.0.0 pyarrow-22.0.0 --- old/pyarrow-22.0.0 +++ pyarrow-22.0.0 ├── stat {} │ @@

[PR] Utilize memory allocator in ReadProperties.GetStream [arrow-go]

2025-10-24 Thread via GitHub
daniel-adam-tfs opened a new pull request, #547: URL: https://github.com/apache/arrow-go/pull/547 ### Rationale for this change Optimization of memory usage, enables the use of custom allocators when reading column data with both buffered and unbuffered readers. ### What changes ar

Re: [PR] Utilize memory allocator in ReadProperties.GetStream [arrow-go]

2025-10-24 Thread via GitHub
daniel-adam-tfs commented on code in PR #547: URL: https://github.com/apache/arrow-go/pull/547#discussion_r2460244298 ## parquet/file/page_reader.go: ## @@ -501,7 +504,16 @@ func (p *serializedPageReader) Page() Page { } func (p *serializedPageReader) decompress(rd io.Reader

Re: [I] [C++][Parquet] Make compression adaptive with V2 data pages [arrow]

2025-10-24 Thread via GitHub
harshkumar-2005 commented on issue #47752: URL: https://github.com/apache/arrow/issues/47752#issuecomment-3442180554 Hi 👋 Since there was no feedback yet regarding the threshold ratio, I’ve gone ahead with a configurable approach via `WriterProperties::compression_threshold().`

Re: [PR] feat: Add get_range_opts, refactor GetOptions with builder [arrow-rs-object-store]

2025-10-24 Thread via GitHub
crepererum commented on PR #517: URL: https://github.com/apache/arrow-rs-object-store/pull/517#issuecomment-3442195719 I do agree with @tustvold on the trait design: it should be lean. If I now read the PR diff correctly, this only only changes the `GetOptions` to offer a builder pat

Re: [PR] feat: refactor GetOptions with builder, add binary examples [arrow-rs-object-store]

2025-10-24 Thread via GitHub
peasee commented on PR #517: URL: https://github.com/apache/arrow-rs-object-store/pull/517#issuecomment-3442224586 Yes, I forgot to update the PR title. This also adds more specific docs and examples around retrieving versioned ranges. -- This is an automated message from the Apache Git

Re: [PR] GH-47389: [Python] CSV and JSON options lack a nice repr/str [arrow]

2025-10-24 Thread via GitHub
AlenkaF commented on PR #47397: URL: https://github.com/apache/arrow/pull/47397#issuecomment-3441437252 @pitrou mind having a quick look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] [Release] APT repository metadata generation is failed for new repository [arrow]

2025-10-24 Thread via GitHub
raulcd commented on issue #47927: URL: https://github.com/apache/arrow/issues/47927#issuecomment-3441679070 Issue resolved by pull request 47928 https://github.com/apache/arrow/pull/47928 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] GH-47927: [Release] Fix APT repository metadata generation with new repository [arrow]

2025-10-24 Thread via GitHub
raulcd merged PR #47928: URL: https://github.com/apache/arrow/pull/47928 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected]

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
raulcd commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442511419 I wanted to try with https://test.pypi.org/project/pyarrow/ to validate PyPI is not performing further checks and adding the LICENSE.txt will be enough but I don't have access there,

Re: [PR] feat: add bitwise ops for `BooleanBufferBuilder` and for `MutableBuffer` [arrow-rs]

2025-10-24 Thread via GitHub
alamb commented on code in PR #8619: URL: https://github.com/apache/arrow-rs/pull/8619#discussion_r2459820375 ## arrow-buffer/src/buffer/mutable_ops.rs: ## @@ -0,0 +1,1256 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

Re: [I] [Python][Packaging] Support Python 3.14 and upload wheels [arrow]

2025-10-24 Thread via GitHub
reneleonhardt commented on issue #47438: URL: https://github.com/apache/arrow/issues/47438#issuecomment-3442531002 Thank you very much for all your work! ❤️ As a side note, even with Arrow 22 and Python 3.14 the free-threading support is still `2 - Beta` instead of `3 - Stable` after

Re: [PR] feat: add bitwise ops for `BooleanBufferBuilder` and for `MutableBuffer` [arrow-rs]

2025-10-24 Thread via GitHub
alamb commented on code in PR #8619: URL: https://github.com/apache/arrow-rs/pull/8619#discussion_r2459823884 ## arrow-buffer/src/buffer/mutable_ops.rs: ## @@ -0,0 +1,1256 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
Schamschula commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442417641 This is blocking my update of apache-arrow and py-pyarrow for MacPorts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
raulcd commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442428146 We sign wheels and the source distribution as part of our release process, see the `pyarrow-22.0.0.tar.gz` asset and ` pyarrow-22.0.0.tar.gz.asc `, ` pyarrow-22.0.0.tar.gz.sha512 `

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
raulcd commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442436372 > This is blocking my update of apache-arrow and py-pyarrow for MacPorts. If you really require the source distribution, can't you try installing from the GitHub release: ```

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
AlenkaF commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442456338 Oh, this is unfortunate. A new patch release only for this sounds silly. Is it possible that we start vote on ML to add a single file (`.txt` only) and that would make it fi

Re: [PR] feat: add bitwise ops for `BooleanBufferBuilder` and for `MutableBuffer` [arrow-rs]

2025-10-24 Thread via GitHub
alamb commented on code in PR #8619: URL: https://github.com/apache/arrow-rs/pull/8619#discussion_r2459757510 ## arrow-buffer/src/buffer/mutable_ops.rs: ## @@ -0,0 +1,1256 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

Re: [I] [Release][Python] PyPI rejects our source distribution due to missing LICENSE.txt [arrow]

2025-10-24 Thread via GitHub
rok commented on issue #47932: URL: https://github.com/apache/arrow/issues/47932#issuecomment-3442489268 +1 to Alenka's proposal of voting to add the needed file. As for adding the python/LICENSE.txt, do we also need python/NOTICE.txt? (see #47141) -- This is an automated message from

Re: [I] Support file row number in Parquet reader [arrow-rs]

2025-10-24 Thread via GitHub
vustef commented on issue #7299: URL: https://github.com/apache/arrow-rs/issues/7299#issuecomment-3441964483 > > Because this is a special column, we need to mark it as such. We use a new extension types for this. > > I really like the idea of using an Extension type for this usecase

Re: [I] [C++] CSV reader: Ability to not infer column types. [arrow]

2025-10-24 Thread via GitHub
AlenkaF commented on issue #22232: URL: https://github.com/apache/arrow/issues/22232#issuecomment-3442010658 There is a PR up that suggests adding `default_column_type` option to the `ConvertOptions`. See: https://github.com/apache/arrow/pull/47663/files. Is there any opinion on the stat

Re: [PR] GH-47502: [C++] Introduce optional default_column_type parameter [arrow]

2025-10-24 Thread via GitHub
AlenkaF commented on PR #47663: URL: https://github.com/apache/arrow/pull/47663#issuecomment-3442005387 Thank you @vladborovtsov for the contribution. I will add info about the proposed solution in the original issue (https://github.com/apache/arrow/issues/22232) so I can see opinions fro

Re: [PR] GH-22232: [C++][Python] Introduce optional default_column_type parameter [arrow]

2025-10-24 Thread via GitHub
github-actions[bot] commented on PR #47663: URL: https://github.com/apache/arrow/pull/47663#issuecomment-3442015760 :warning: GitHub issue #22232 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] GH-47897: [C++][Python] Allow default column type for CSV columns [arrow]

2025-10-24 Thread via GitHub
cottrell commented on PR #47898: URL: https://github.com/apache/arrow/pull/47898#issuecomment-3442030128 > Thank you for the contribution @cottrell. There is a very similar, probably also AI generated (?) PR up: #47663. It looks more complete so I propose pushing that one forward in case ot

Re: [PR] GH-47897: [C++][Python] Allow default column type for CSV columns [arrow]

2025-10-24 Thread via GitHub
cottrell closed pull request #47898: GH-47897: [C++][Python] Allow default column type for CSV columns URL: https://github.com/apache/arrow/pull/47898 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] GH-46098 : [C++][FlightRPC] ODBC Environment Attribute Implementation [arrow]

2025-10-24 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #47760: URL: https://github.com/apache/arrow/pull/47760#issuecomment-3442074494 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 56e38362def18a5822e137c14b3d10bf7516a5e2. There were no

Re: [PR] GH-22232: [C++][Python] Introduce optional default_column_type parameter [arrow]

2025-10-24 Thread via GitHub
vladborovtsov commented on PR #47663: URL: https://github.com/apache/arrow/pull/47663#issuecomment-3442085787 Hi @AlenkaF I'm happy to continue the labour and discussion to get that merged. As for AI, it wasn't used much, although I tried :) With such huge codebase the generation qual

Re: [PR] GH-22232: [C++][Python] Introduce optional default_column_type parameter [arrow]

2025-10-24 Thread via GitHub
AlenkaF commented on PR #47663: URL: https://github.com/apache/arrow/pull/47663#issuecomment-3442107316 Happy to see a response! All good, it is totally OK to use gen AI wisely ;) I will wait for an opinion from a C++ dev and in the meantime try to look at the Python part. -- Th

Re: [I] replace macos-13 github runner [arrow-java]

2025-10-24 Thread via GitHub
lidavidm commented on issue #869: URL: https://github.com/apache/arrow-java/issues/869#issuecomment-3445143573 https://github.com/apache/arrow-java/pull/865 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat(go/adbc/driver/bigquery): Add option to link failed jobs [arrow-adbc]

2025-10-24 Thread via GitHub
lidavidm commented on PR #3614: URL: https://github.com/apache/arrow-adbc/pull/3614#issuecomment-3445146104 Is it that much? I personally detest having to hunt for the "give me the real error" knob -- This is an automated message from the Apache Git Service. To respond to the message, ple

  1   2   >