Re: [PR] Replace `ArrayData` with direct Array construction [arrow-rs]

2026-02-05 Thread via GitHub
jhorstmann commented on code in PR #9338: URL: https://github.com/apache/arrow-rs/pull/9338#discussion_r2769299579 ## arrow-array/src/array/boolean_array.rs: ## @@ -520,8 +520,10 @@ impl BooleanArray { let data = val_builder.as_slice_mut(); let null_slice = n

Re: [I] [Parquet] Prototype ALP encoding [arrow-rs]

2026-02-05 Thread via GitHub
alamb commented on issue #8748: URL: https://github.com/apache/arrow-rs/issues/8748#issuecomment-3853840834 BTW I started working on generating example Parquet files that are encoded using ALP using the C++ implementation. See here - https://github.com/apache/arrow/pull/49154 It do

[PR] Alamb/example encoding writer [arrow]

2026-02-05 Thread via GitHub
alamb opened a new pull request, #49154: URL: https://github.com/apache/arrow/pull/49154 This builds on the following PR from @prtkgaur - https://github.com/apache/arrow/pull/48345 It contains a binary that creates files using the new ALP encoding here: - https://github.com/apach

Re: [PR] GH-32007 [Python] Support arithmetic on arrays and scalars [arrow]

2026-02-05 Thread via GitHub
pitrou commented on code in PR #48085: URL: https://github.com/apache/arrow/pull/48085#discussion_r2769319375 ## python/pyarrow/tests/test_array.py: ## @@ -4398,3 +4399,67 @@ def test_non_cpu_array(): arr.tolist() with pytest.raises(NotImplementedError): a

Re: [PR] GH-46531: [C++] Add type_singleton utility function and tests. [arrow]

2026-02-05 Thread via GitHub
pitrou commented on code in PR #47922: URL: https://github.com/apache/arrow/pull/47922#discussion_r2769349020 ## cpp/src/arrow/type_test.cc: ## @@ -50,6 +52,41 @@ TEST(TestTypeId, AllTypeIds) { ASSERT_EQ(static_cast(all_ids.size()), Type::MAX_ID); } +TEST(TestTypeSingleton

Re: [PR] GH-48868: [Doc] Document security model for the Arrow formats [arrow]

2026-02-05 Thread via GitHub
alamb commented on code in PR #48870: URL: https://github.com/apache/arrow/pull/48870#discussion_r2769381979 ## docs/source/format/Security.rst: ## @@ -0,0 +1,280 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See t

Re: [I] [Epic] Implement `RunArray` (Run Length Encoding (RLE) / Run End Encoding (REE) support) [arrow-rs]

2026-02-05 Thread via GitHub
Jefffrey commented on issue #3520: URL: https://github.com/apache/arrow-rs/issues/3520#issuecomment-3853800223 Status update: I think to close this epic we'd need the following PRs/issues at minimum: - https://github.com/apache/arrow-rs/issues/8016 - https://github.com/apache

Re: [PR] Alamb/example encoding writer [arrow]

2026-02-05 Thread via GitHub
github-actions[bot] commented on PR #49154: URL: https://github.com/apache/arrow/pull/49154#issuecomment-3853826044 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue f

Re: [PR] GH-48868: [Doc] Document security model for the Arrow formats [arrow]

2026-02-05 Thread via GitHub
alamb commented on code in PR #48870: URL: https://github.com/apache/arrow/pull/48870#discussion_r2769386034 ## docs/source/format/Security.rst: ## @@ -0,0 +1,278 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See t

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769392681 ## parquet/src/file/properties.rs: ## @@ -575,7 +595,34 @@ impl WriterPropertiesBuilder { /// If the value is set to 0. pub fn set_max_row_group_size(mut se

Re: [I] [C++][Python] Act on existing deprecations [arrow]

2026-02-05 Thread via GitHub
AliRana30 commented on issue #49153: URL: https://github.com/apache/arrow/issues/49153#issuecomment-3854000974 Ok sure!! Then I will open a PR for what would be the best solution for these deprecations issue): -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
yonipeleg33 commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769417566 ## parquet/src/file/properties.rs: ## @@ -575,7 +595,34 @@ impl WriterPropertiesBuilder { /// If the value is set to 0. pub fn set_max_row_group_size(mut

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769392681 ## parquet/src/file/properties.rs: ## @@ -575,7 +595,34 @@ impl WriterPropertiesBuilder { /// If the value is set to 0. pub fn set_max_row_group_size(mut se

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769446078 ## parquet/src/file/properties.rs: ## @@ -575,7 +595,34 @@ impl WriterPropertiesBuilder { /// If the value is set to 0. pub fn set_max_row_group_size(mut se

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769448374 ## parquet/src/file/properties.rs: ## @@ -575,7 +595,34 @@ impl WriterPropertiesBuilder { /// If the value is set to 0. pub fn set_max_row_group_size(mut se

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769458573 ## parquet/src/arrow/arrow_writer/mod.rs: ## @@ -4518,4 +4575,185 @@ mod tests { assert_eq!(get_dict_page_size(col0_meta), 1024 * 1024); assert_eq!(

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769462094 ## parquet/src/arrow/arrow_writer/mod.rs: ## @@ -4518,4 +4575,185 @@ mod tests { assert_eq!(get_dict_page_size(col0_meta), 1024 * 1024); assert_eq!(

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769452671 ## parquet/src/arrow/arrow_writer/mod.rs: ## @@ -4518,4 +4575,185 @@ mod tests { assert_eq!(get_dict_page_size(col0_meta), 1024 * 1024); assert_eq!(

Re: [I] Remove file-handle from object store GET operations [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum commented on issue #18: URL: https://github.com/apache/arrow-rs-object-store/issues/18#issuecomment-3854039340 It seems that DataFusion may want to reuse file descriptors, see https://github.com/apache/datafusion/issues/19983 . I am wondering if we should evolve the `object_stor

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769467123 ## parquet/src/arrow/arrow_writer/mod.rs: ## @@ -4518,4 +4575,185 @@ mod tests { assert_eq!(get_dict_page_size(col0_meta), 1024 * 1024); assert_eq!(

Re: [PR] Use platform specific `read_at` when available [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum commented on code in PR #628: URL: https://github.com/apache/arrow-rs-object-store/pull/628#discussion_r2769466888 ## src/local.rs: ## Review Comment: Also see https://github.com/apache/arrow-rs-object-store/issues/18#issuecomment-3854039340 -- This is an a

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769479288 ## parquet/src/file/properties.rs: ## @@ -45,6 +45,8 @@ pub const DEFAULT_STATISTICS_ENABLED: EnabledStatistics = EnabledStatistics::Pag pub const DEFAULT_WRITE_PAG

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769486009 ## parquet/src/file/properties.rs: ## @@ -575,7 +595,34 @@ impl WriterPropertiesBuilder { /// If the value is set to 0. pub fn set_max_row_group_size(mut se

[PR] GH-43075: [Docs][Python] Document that source parameter to IPC reader… [arrow]

2026-02-05 Thread via GitHub
aayush-1o opened a new pull request, #49152: URL: https://github.com/apache/arrow/pull/49152 …s can be a file path Thanks for opening a pull request! If this is your first pull request you can find detailed information on how to contribute here: * [New Contributor's G

Re: [PR] GH-49102: [CI] Add type checking infrastructure and CI workflow for type annotations [arrow]

2026-02-05 Thread via GitHub
rok commented on PR #48618: URL: https://github.com/apache/arrow/pull/48618#issuecomment-3852566803 I rebased on main to fix the docs CI issues. I'd like to continue on the annotation PRs next week so I'll merge Sunday evening. -- This is an automated message from the Apache Git Servic

Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-02-05 Thread via GitHub
SmritiAgrawal04 closed pull request #552: Whitelisting Onelake API & Workspace PL FQDNs URL: https://github.com/apache/arrow-rs-object-store/pull/552 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] parquet: reuse utf8_validation_buffer [arrow-rs]

2026-02-05 Thread via GitHub
alamb commented on PR #9317: URL: https://github.com/apache/arrow-rs/pull/9317#issuecomment-3853317064 🤔 the benchmarks seem to show a slowdown ``` group buffer_reuse

Re: [PR] parquet: reuse utf8_validation_buffer [arrow-rs]

2026-02-05 Thread via GitHub
alamb commented on PR #9317: URL: https://github.com/apache/arrow-rs/pull/9317#issuecomment-3853317529 run benchmark arrow_reader -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] parquet: reuse utf8_validation_buffer [arrow-rs]

2026-02-05 Thread via GitHub
alamb-ghbot commented on PR #9317: URL: https://github.com/apache/arrow-rs/pull/9317#issuecomment-3853318163 🤖 `./gh_compare_arrow.sh` [gh_compare_arrow.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_arrow.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~2

Re: [I] rust snowflake: bulk ingest with the ADBC_INGEST_OPTION_MODE_CREATE_APPEND mode does not fail during schema updates [arrow-adbc]

2026-02-05 Thread via GitHub
Pranav2612000 commented on issue #3945: URL: https://github.com/apache/arrow-adbc/issues/3945#issuecomment-3852374416 This seems to be the default Snowflake behaviour ( https://community.snowflake.com/s/article/How-to-block-uploads-if-the-schema-of-a-Parquet-file-and-the-schema-of-a-table-do

Re: [PR] GH-41863: [Python][Parquet] Support lz4_raw as a compression name alias [arrow]

2026-02-05 Thread via GitHub
nwoolmer commented on PR #49135: URL: https://github.com/apache/arrow/pull/49135#issuecomment-3852372020 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] GH-49129: [R][Parquet] Guard Parquet dataset code with ARROW_R_WITH_PARQUET [arrow]

2026-02-05 Thread via GitHub
IsabelParedes commented on PR #49128: URL: https://github.com/apache/arrow/pull/49128#issuecomment-3852304514 > One quick note about a slightly weird bit of the package's inner workings. In the `r/data-raw` directory there is a file called `codegen.R`, which generates the code in `r/src/arr

[I] Add fuzz regression testing to parquet/arrow/csv readers [arrow-rs]

2026-02-05 Thread via GitHub
alamb opened a new issue, #9358: URL: https://github.com/apache/arrow-rs/issues/9358 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** The arrow-testing repository has several data files that caused issues with the C/C++ implementat

Re: [PR] fix(azure): correct Microsoft Fabric blob endpoint domain [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum merged PR #631: URL: https://github.com/apache/arrow-rs-object-store/pull/631 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Use platform specific `read_at` when available [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum commented on code in PR #628: URL: https://github.com/apache/arrow-rs-object-store/pull/628#discussion_r2768976704 ## src/lib.rs: ## @@ -1650,22 +1648,11 @@ impl GetResult { #[cfg(all(feature = "fs", not(target_arch = "wasm32")))] GetResultP

Re: [I] `LocalFileSystem`: use `read_at` instead of seek + read [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum closed issue #622: `LocalFileSystem`: use `read_at` instead of seek + read URL: https://github.com/apache/arrow-rs-object-store/issues/622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] [C++][Python] Act on existing deprecations [arrow]

2026-02-05 Thread via GitHub
pitrou commented on issue #49153: URL: https://github.com/apache/arrow/issues/49153#issuecomment-3853448458 Hi @AliRana30 , yes, of course, contributions from non-maintainers are always welcome! You can also tackle a subset of all deprecations, according to what you're comfortable with. -

Re: [PR] Use platform specific `read_at` when available [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum commented on code in PR #628: URL: https://github.com/apache/arrow-rs-object-store/pull/628#discussion_r2769002099 ## src/local.rs: ## Review Comment: Technically we could push this even further (maybe in a follow up): `read_at` doesn't modify the current read

Re: [PR] Use platform specific `read_at` when available [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum merged PR #628: URL: https://github.com/apache/arrow-rs-object-store/pull/628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] [C++][Python] Act on existing deprecations [arrow]

2026-02-05 Thread via GitHub
AliRana30 commented on issue #49153: URL: https://github.com/apache/arrow/issues/49153#issuecomment-3854082662 Hi @pitrou, I've analyzed the C++ codebase and found **7 groups of deprecated APIs** from v13.0.0 to v24.0.0. ## High Priority Removals (3+ years old): 1. `HasValidityB

[PR] Add regression tests for Parquet large binary offset overflow [arrow-rs]

2026-02-05 Thread via GitHub
vigneshsiva11 opened a new pull request, #9361: URL: https://github.com/apache/arrow-rs/pull/9361 # Which issue does this PR close? - Refs #7973 # Rationale for this change This PR adds regression coverage for an offset overflow panic encountered when reading Parquet fil

Re: [I] Error when reading row group larger than 2GB (total string length per 8k row batch exceeds 2GB) [arrow-rs]

2026-02-05 Thread via GitHub
vigneshsiva11 commented on issue #7973: URL: https://github.com/apache/arrow-rs/issues/7973#issuecomment-3854131064 I’ve opened a PR adding regression tests. The PR adds test coverage for large binary columns across multiple Parquet encodings (PLAIN, DELTA_LENGTH_BYTE_ARRAY, DELTA_BYT

Re: [PR] fix(wasm): avoid std::time::Instant::now() and expand testing [arrow-rs-object-store]

2026-02-05 Thread via GitHub
kylebarron commented on PR #625: URL: https://github.com/apache/arrow-rs-object-store/pull/625#issuecomment-3854144711 Perhaps we should have a `web` named feature flag, and not automatically assume that `wasm32-unknown-unknown` will be run in the browser? E.g. https://github.com/cloudflar

Re: [PR] Add CRC64NVME checksum support [arrow-rs-object-store]

2026-02-05 Thread via GitHub
alamb commented on code in PR #633: URL: https://github.com/apache/arrow-rs-object-store/pull/633#discussion_r2769529134 ## src/aws/checksum.rs: ## @@ -24,12 +24,15 @@ use std::str::FromStr; pub enum Checksum { /// SHA-256 algorithm. SHA256, +/// CRC64-NVME algor

Re: [PR] Add regression tests for Parquet large binary offset overflow [arrow-rs]

2026-02-05 Thread via GitHub
Copilot commented on code in PR #9361: URL: https://github.com/apache/arrow-rs/pull/9361#discussion_r2769569636 ## parquet/tests/arrow_reader/large_string_overflow.rs: ## @@ -0,0 +1,116 @@ +use std::sync::Arc; Review Comment: This file is missing the Apache License header th

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769572896 ## parquet/src/arrow/arrow_writer/mod.rs: ## @@ -187,8 +187,11 @@ pub struct ArrowWriter { /// Creates new [`ArrowRowGroupWriter`] instances as required row

[PR] Fix `ToArrow` with non-one-based indices [arrow-julia]

2026-02-05 Thread via GitHub
nalimilan opened a new pull request, #587: URL: https://github.com/apache/arrow-julia/pull/587 This generates errors with `CategoricalArray` due to its internal `CategoricalRefPool` type. Also drop `eltype`, which isn't needed as there is an `AbstractArray` fallback. Fixes #58

Re: [I] [C++][Python] Act on existing deprecations [arrow]

2026-02-05 Thread via GitHub
pitrou commented on issue #49153: URL: https://github.com/apache/arrow/issues/49153#issuecomment-3853003927 @AlenkaF @thisisnic @rok I don't know if one of you would like to work on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Add CRC64NVME checksum support [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum commented on code in PR #633: URL: https://github.com/apache/arrow-rs-object-store/pull/633#discussion_r2769013746 ## Cargo.toml: ## @@ -47,6 +47,7 @@ walkdir = { version = "2", optional = true } # Cloud storage support base64 = { version = "0.22", default-featur

Re: [PR] build(deps): update nix requirement from 0.30.0 to 0.31.1 [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum commented on PR #616: URL: https://github.com/apache/arrow-rs-object-store/pull/616#issuecomment-3853509132 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Implement typos-cli [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum commented on PR #570: URL: https://github.com/apache/arrow-rs-object-store/pull/570#issuecomment-3853503409 Sorry for the late reply. Since `main` has advanced since then, could you rebase this one on top of `main`? Just to make sure the current state is still passing. -- Thi

Re: [I] Error when reading row group larger than 2GB (total string length per 8k row batch exceeds 2GB) [arrow-rs]

2026-02-05 Thread via GitHub
vigneshsiva11 commented on issue #7973: URL: https://github.com/apache/arrow-rs/issues/7973#issuecomment-3853537062 Thanks @alamb for the confirmation! I’ll start by adding regression tests that reproduce the overflow for large string/binary columns, and I’ll include coverage for the

Re: [PR] Pluggable Crypto / Update reqwest 0.13 [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum commented on code in PR #585: URL: https://github.com/apache/arrow-rs-object-store/pull/585#discussion_r2769090670 ## .github/workflows/ci.yml: ## @@ -39,6 +39,10 @@ jobs: - uses: actions/checkout@v6 - name: Setup Clippy run: rustup component a

Re: [PR] build(deps): update nix requirement from 0.30.0 to 0.31.1 [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum merged PR #616: URL: https://github.com/apache/arrow-rs-object-store/pull/616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] fix: missing 5xx error body when retry exhausted [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum commented on PR #618: URL: https://github.com/apache/arrow-rs-object-store/pull/618#issuecomment-3853590477 CI got fixed in the meantime, can you please rebase? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Use platform specific `read_at` when available [arrow-rs-object-store]

2026-02-05 Thread via GitHub
Dandandan commented on code in PR #628: URL: https://github.com/apache/arrow-rs-object-store/pull/628#discussion_r2769144152 ## src/local.rs: ## Review Comment: E.g. https://github.com/apache/datafusion/issues/19983 -- This is an automated message from the Apache Git S

Re: [PR] GH-48868: [Doc] Document security model for the Arrow formats [arrow]

2026-02-05 Thread via GitHub
pitrou commented on code in PR #48870: URL: https://github.com/apache/arrow/pull/48870#discussion_r2769137993 ## docs/source/format/Security.rst: ## @@ -0,0 +1,280 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See

Re: [PR] Use platform specific `read_at` when available [arrow-rs-object-store]

2026-02-05 Thread via GitHub
Dandandan commented on code in PR #628: URL: https://github.com/apache/arrow-rs-object-store/pull/628#discussion_r2769134923 ## src/local.rs: ## Review Comment: Yes this would allow it, but I think it probably needs to be controlled from somewhere else to avoid the overhe

Re: [I] Remove file-handle from object store GET operations [arrow-rs-object-store]

2026-02-05 Thread via GitHub
AdamGS commented on issue #18: URL: https://github.com/apache/arrow-rs-object-store/issues/18#issuecomment-3854863860 I would also love this, I think a file-like API makes a lot of sense for some perf-sensetive stuff locally, and works perfectly well remotely. -- This is an automated me

Re: [PR] GH-49155: [C++][IPC] Allow disabling extension type deserialization [arrow]

2026-02-05 Thread via GitHub
github-actions[bot] commented on PR #49157: URL: https://github.com/apache/arrow/pull/49157#issuecomment-3854773094 :warning: GitHub issue #49155 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [I] [C++] Allow disabling extension type deserialization when reading IPC [arrow]

2026-02-05 Thread via GitHub
pitrou commented on issue #49155: URL: https://github.com/apache/arrow/issues/49155#issuecomment-3854800525 @AliRana30 I think that the API should be discussed first. I see two main possibilities: 1. a boolean flag (such as `bool recreate_extensions = true`) 2. an allowlist of allowed

Re: [I] [C++] Allow disabling extension type deserialization when reading IPC [arrow]

2026-02-05 Thread via GitHub
AliRana30 commented on issue #49155: URL: https://github.com/apache/arrow/issues/49155#issuecomment-3854883611 Apology for wrong PR and commits. I will wait for maintainers opinion before creating the new PR. -- This is an automated message from the Apache Git Service. To respond t

Re: [I] Support `BinaryView` in `bit_length` kernel [arrow-rs]

2026-02-05 Thread via GitHub
Abhisheklearn12 commented on issue #9351: URL: https://github.com/apache/arrow-rs/issues/9351#issuecomment-3854922805 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] GH-1007: fix: does not break class loading if direct buffer allocator is not available [arrow-java]

2026-02-05 Thread via GitHub
torito closed pull request #1008: GH-1007: fix: does not break class loading if direct buffer allocator is not available URL: https://github.com/apache/arrow-java/pull/1008 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] parquet: reuse utf8_validation_buffer [arrow-rs]

2026-02-05 Thread via GitHub
alamb-ghbot commented on PR #9317: URL: https://github.com/apache/arrow-rs/pull/9317#issuecomment-3855002079 🤖: Benchmark completed Details ``` group buffer_reuse

Re: [PR] GH-49064: [C++] Fix thread safety in DictionaryArray::dictionary() [arrow]

2026-02-05 Thread via GitHub
AliRana30 closed pull request #49080: GH-49064: [C++] Fix thread safety in DictionaryArray::dictionary() URL: https://github.com/apache/arrow/pull/49080 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Optimize `from_bitwise_unary_op` [arrow-rs]

2026-02-05 Thread via GitHub
Dandandan commented on PR #9297: URL: https://github.com/apache/arrow-rs/pull/9297#issuecomment-3854819862 run benchmark boolean_kernels -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] GH-49155: [C++][IPC] Allow disabling extension type deserialization [arrow]

2026-02-05 Thread via GitHub
AliRana30 closed pull request #49157: GH-49155: [C++][IPC] Allow disabling extension type deserialization URL: https://github.com/apache/arrow/pull/49157 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] [C++][Python] Act on existing deprecations [arrow]

2026-02-05 Thread via GitHub
pitrou commented on issue #49153: URL: https://github.com/apache/arrow/issues/49153#issuecomment-3854901225 @AliRana30 For the record, did you use AI to generate this? It can be ok to use AI as long as you understand [our recommendations about this](https://arrow.apache.org/docs/dev/develop

Re: [PR] fix(wasm): avoid std::time::Instant::now() and expand testing [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum commented on PR #625: URL: https://github.com/apache/arrow-rs-object-store/pull/625#issuecomment-3854937362 Yeah, I would feel more comfortable w/ an explicit `web` feature. I think that would make it clearer (esp. if we document that, maybe in our top-level crate docs). -- T

Re: [I] Remove file-handle from object store GET operations [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum commented on issue #18: URL: https://github.com/apache/arrow-rs-object-store/issues/18#issuecomment-3854944404 Alright, then I'm gonna draft that as a PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Add CRC64NVME checksum support [arrow-rs-object-store]

2026-02-05 Thread via GitHub
crepererum commented on PR #633: URL: https://github.com/apache/arrow-rs-object-store/pull/633#issuecomment-3854952440 Regarding the crate of choice: Since we -- I think -- have decided that this is a breaking change, I think we should merge this AFTER #585 and hook up the crc calculation

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
yonipeleg33 commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769983367 ## parquet/src/file/properties.rs: ## @@ -575,7 +595,34 @@ impl WriterPropertiesBuilder { /// If the value is set to 0. pub fn set_max_row_group_size(mut

Re: [PR] Add CRC64NVME checksum support [arrow-rs-object-store]

2026-02-05 Thread via GitHub
orlp commented on code in PR #633: URL: https://github.com/apache/arrow-rs-object-store/pull/633#discussion_r2770278559 ## src/aws/checksum.rs: ## @@ -24,12 +24,15 @@ use std::str::FromStr; pub enum Checksum { Review Comment: @alamb If the enum is marked as `#[non_exhausti

Re: [PR] Add CRC64NVME checksum support [arrow-rs-object-store]

2026-02-05 Thread via GitHub
orlp commented on PR #633: URL: https://github.com/apache/arrow-rs-object-store/pull/633#issuecomment-3855052387 @crepererum I don't see why, CRC has nothing to do with cryptography. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[PR] Parquet: prevent binary offset overflow by stopping batch early [arrow-rs]

2026-02-05 Thread via GitHub
vigneshsiva11 opened a new pull request, #9362: URL: https://github.com/apache/arrow-rs/pull/9362 # Which issue does this PR close? - Closes #7973. # Rationale for this change When reading Parquet files containing very large binary or string values, the Arrow Parquet rea

Re: [I] [C++][Python] Act on existing deprecations [arrow]

2026-02-05 Thread via GitHub
AliRana30 commented on issue #49153: URL: https://github.com/apache/arrow/issues/49153#issuecomment-3855059810 Yes!! I have used Ai but I will read all the guidelines before commiting any changes and fixes . -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2770294974 ## parquet/src/file/properties.rs: ## @@ -575,7 +595,34 @@ impl WriterPropertiesBuilder { /// If the value is set to 0. pub fn set_max_row_group_size(mut se

Re: [PR] Optimize `from_bitwise_unary_op` [arrow-rs]

2026-02-05 Thread via GitHub
alamb-ghbot commented on PR #9297: URL: https://github.com/apache/arrow-rs/pull/9297#issuecomment-3855066903 🤖 `./gh_compare_arrow.sh` [gh_compare_arrow.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_arrow.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~2

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2770305033 ## parquet/src/arrow/arrow_writer/mod.rs: ## @@ -4518,4 +4575,185 @@ mod tests { assert_eq!(get_dict_page_size(col0_meta), 1024 * 1024); assert_eq!(

Re: [PR] Optimize `from_bitwise_unary_op` [arrow-rs]

2026-02-05 Thread via GitHub
alamb-ghbot commented on PR #9297: URL: https://github.com/apache/arrow-rs/pull/9297#issuecomment-3855089782 🤖: Benchmark completed Details ``` groupmain optimize_from_bitwise_unary_op -

Re: [PR] fix: does not break class loading if direct buffer allocator is not available [arrow-java]

2026-02-05 Thread via GitHub
torito closed pull request #1006: fix: does not break class loading if direct buffer allocator is not available URL: https://github.com/apache/arrow-java/pull/1006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] GH-1007: fix: does not break class loading if direct buffer allocator is not available [arrow-java]

2026-02-05 Thread via GitHub
torito opened a new pull request, #1008: URL: https://github.com/apache/arrow-java/pull/1008 ## What's Changed The Direct Buffer is not always needed to use Arrow memory, however, we cannot load MemoryUtil class if we don't set: ``` --add-opens=java.base/java.nio=org.apache.arro

Re: [PR] GH-1007: fix: does not break class loading if direct buffer allocator is not available [arrow-java]

2026-02-05 Thread via GitHub
github-actions[bot] commented on PR #1008: URL: https://github.com/apache/arrow-java/pull/1008#issuecomment-3854565801 Thank you for opening a pull request! Please label the PR with one or more of: - bug-fix - chore - dependencies - documentation - enhancemen

Re: [PR] Use platform specific `read_at` when available [arrow-rs-object-store]

2026-02-05 Thread via GitHub
Dandandan commented on code in PR #628: URL: https://github.com/apache/arrow-rs-object-store/pull/628#discussion_r2769689847 ## src/local.rs: ## @@ -988,26 +988,79 @@ pub(crate) fn read_range( // Don't read past end of file let to_read = range.end.min(file_len) - ra

Re: [I] Add `ListViewArray` and `LargeListViewArray` [arrow-rs]

2026-02-05 Thread via GitHub
alamb closed issue #5375: Add `ListViewArray` and `LargeListViewArray` URL: https://github.com/apache/arrow-rs/issues/5375 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] doc: remove disclaimer about `ListView` not being fully supported [arrow-rs]

2026-02-05 Thread via GitHub
alamb merged PR #9356: URL: https://github.com/apache/arrow-rs/pull/9356 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected]

Re: [I] [C++] Allow disabling extension type deserialization when reading IPC [arrow]

2026-02-05 Thread via GitHub
AliRana30 commented on issue #49155: URL: https://github.com/apache/arrow/issues/49155#issuecomment-3854636174 @pitrou I will create a PR for this soon!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] Remove file-handle from object store GET operations [arrow-rs-object-store]

2026-02-05 Thread via GitHub
Dandandan commented on issue #18: URL: https://github.com/apache/arrow-rs-object-store/issues/18#issuecomment-3854349661 That makes quite a bit of sense! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769653185 ## parquet/src/arrow/arrow_writer/mod.rs: ## @@ -4518,4 +4575,185 @@ mod tests { assert_eq!(get_dict_page_size(col0_meta), 1024 * 1024); assert_eq!(

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769659979 ## parquet/src/arrow/arrow_writer/mod.rs: ## @@ -4518,4 +4575,185 @@ mod tests { assert_eq!(get_dict_page_size(col0_meta), 1024 * 1024); assert_eq!(

Re: [I] Support Arrow C Stream interface containing stream of `Array` [arrow-rs]

2026-02-05 Thread via GitHub
kylebarron commented on issue #6586: URL: https://github.com/apache/arrow-rs/issues/6586#issuecomment-3854326646 I think that makes sense. I've written essentially all of that inside of [`pyo3-arrow`](https://docs.rs/pyo3-arrow). E.g. https://github.com/kylebarron/arro3/blob/0c59fc9

Re: [PR] Add CRC64NVME checksum support [arrow-rs-object-store]

2026-02-05 Thread via GitHub
orlp commented on PR #633: URL: https://github.com/apache/arrow-rs-object-store/pull/633#issuecomment-3854301124 But to put some *paper napkin* numbers to this... Due to the way the code is currently architectured (this might be fundamental if the hash has to be in the header, I don't know

Re: [PR] Add CRC64NVME checksum support [arrow-rs-object-store]

2026-02-05 Thread via GitHub
alamb commented on PR #633: URL: https://github.com/apache/arrow-rs-object-store/pull/633#issuecomment-3854655305 I did some research on this crc-fast crate: https://crates.io/crates/crc-fast It does look primarily the work of one individual which is always a little concerning from a

Re: [PR] Add CRC64NVME checksum support [arrow-rs-object-store]

2026-02-05 Thread via GitHub
orlp commented on PR #633: URL: https://github.com/apache/arrow-rs-object-store/pull/633#issuecomment-3854182244 @alamb I don't have any publicly available benchmarks, but @kdn36 measured that for one of our workflows in Polars Cloud it saved 25% *end-to-end*. -- This is an automated mes

Re: [PR] Move row_filter async tests from parquet async reader [arrow-rs]

2026-02-05 Thread via GitHub
alamb merged PR #9355: URL: https://github.com/apache/arrow-rs/pull/9355 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected]

Re: [I] [Doc] Document security model for the Arrow formats [arrow]

2026-02-05 Thread via GitHub
pitrou commented on issue #48868: URL: https://github.com/apache/arrow/issues/48868#issuecomment-3854460532 Issue resolved by pull request 48870 https://github.com/apache/arrow/pull/48870 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] GH-48868: [Doc] Document security model for the Arrow formats [arrow]

2026-02-05 Thread via GitHub
pitrou merged PR #48870: URL: https://github.com/apache/arrow/pull/48870 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected]

Re: [PR] [Parquet] Support skipping pages with mask based evaluation [arrow-rs]

2026-02-05 Thread via GitHub
sdf-jkl commented on PR #9118: URL: https://github.com/apache/arrow-rs/pull/9118#issuecomment-3854465947 https://github.com/user-attachments/assets/9a70c6be-e319-4958-ba67-be083a46c217"; /> With the tests moved this should be more readable -- This is an automated message from the A

Re: [PR] feat: add max_row_group_bytes option to WriterProperties [arrow-rs]

2026-02-05 Thread via GitHub
rluvaton commented on code in PR #9357: URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2769648479 ## parquet/src/arrow/arrow_writer/mod.rs: ## @@ -4518,4 +4575,185 @@ mod tests { assert_eq!(get_dict_page_size(col0_meta), 1024 * 1024); assert_eq!(

  1   2   3   >