[GitHub] [arrow] etseidl opened a new issue, #34086: Parquet V2 page headers have incorrect number of rows

2023-02-08 Thread via GitHub
etseidl opened a new issue, #34086: URL: https://github.com/apache/arrow/issues/34086 ### Describe the bug, including details regarding any error messages, version, and platform. When writing Parquet files with version 2 page headers, the `num_rows` field is incorrect. This appears

[GitHub] [arrow-adbc] lidavidm opened a new issue, #444: [Release] Rate limit on binary downloads seems to have gotten even more stringent

2023-02-08 Thread via GitHub
lidavidm opened a new issue, #444: URL: https://github.com/apache/arrow-adbc/issues/444 As observed in the verification jobs: https://github.com/apache/arrow-adbc/actions/runs/4129494321 We might want to throttle this down even more and/or retry (with backoff) -- This is an automat

[GitHub] [arrow-adbc] lidavidm opened a new issue, #445: [Release] Verification script wheel issues

2023-02-08 Thread via GitHub
lidavidm opened a new issue, #445: URL: https://github.com/apache/arrow-adbc/issues/445 - Installing the driver manager/drivers separately causes pip to downgrade the driver manager. - The Flight SQL wheel has a lower platform tag, so it accidentally gets neglected. -- This is an auto

[GitHub] [arrow] westonpace opened a new issue, #34088: [Python] Typo in get_writer

2023-02-08 Thread via GitHub
westonpace opened a new issue, #34088: URL: https://github.com/apache/arrow/issues/34088 ### Describe the bug, including details regarding any error messages, version, and platform. When trying to open an IPC **writer** on something that is not a file object the error is: ```

[GitHub] [arrow-flight-sql-postgresql] kou opened a new issue, #13: Add support for closing a database (session)

2023-02-08 Thread via GitHub
kou opened a new issue, #13: URL: https://github.com/apache/arrow-flight-sql-postgresql/issues/13 It seems that Apache Arrow Flight SQL doesn't provide a command that closes the current session explicitly: https://arrow.apache.org/docs/format/FlightSql.html In https://lists.apache.

[GitHub] [arrow] kou closed issue #20272: [C++] Bump version of bundled AWS SDK

2023-02-08 Thread via GitHub
kou closed issue #20272: [C++] Bump version of bundled AWS SDK URL: https://github.com/apache/arrow/issues/20272 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [arrow] kou closed issue #34082: [CI][Packaging] debian-bookworm-* failed

2023-02-08 Thread via GitHub
kou closed issue #34082: [CI][Packaging] debian-bookworm-* failed URL: https://github.com/apache/arrow/issues/34082 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

[GitHub] [arrow] thisisnic opened a new issue, #34092: [R] bad default in `open_csv_dataset()`

2023-02-09 Thread via GitHub
thisisnic opened a new issue, #34092: URL: https://github.com/apache/arrow/issues/34092 ### Describe the bug, including details regarding any error messages, version, and platform. I was putting together an example to diagnose a user error, and got an unexpected error message:

[GitHub] [arrow] pulkomandy opened a new issue, #34093: Cross compiling in Yocto is not working

2023-02-09 Thread via GitHub
pulkomandy opened a new issue, #34093: URL: https://github.com/apache/arrow/issues/34093 ### Describe the bug, including details regarding any error messages, version, and platform. Hello, We are using Arrow on an embedded system and packaging it using Yocto. We are currently

[GitHub] [arrow] thisisnic opened a new issue, #34094: [C++] Boost minimum version needs increasing for clang16 builds

2023-02-09 Thread via GitHub
thisisnic opened a new issue, #34094: URL: https://github.com/apache/arrow/issues/34094 Although the bundled boost version was increased in #33890, the clang16 build is still failing on CRAN because the machine it's being tested on has an older version already installed, and so the bundled

[GitHub] [arrow] thisisnic opened a new issue, #34095: [R][CI] Add clang16 build to nightlies

2023-02-09 Thread via GitHub
thisisnic opened a new issue, #34095: URL: https://github.com/apache/arrow/issues/34095 ### Describe the enhancement requested CRAN is now running tests on machines with clang16 so we should too. Let's also chuck an old version of Boost on there (if that makes sense?) as this is wha

[GitHub] [arrow] thisisnic closed issue #34095: [R][CI] Add clang16 build to nightlies

2023-02-09 Thread via GitHub
thisisnic closed issue #34095: [R][CI] Add clang16 build to nightlies URL: https://github.com/apache/arrow/issues/34095 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [arrow] ovcharenko opened a new issue, #34097: Random crash when running PyArrow from several threads

2023-02-09 Thread via GitHub
ovcharenko opened a new issue, #34097: URL: https://github.com/apache/arrow/issues/34097 ### Describe the bug, including details regarding any error messages, version, and platform. We noticed such errors happened from time to time (about 10% of time) after upgrading to PyArrow versi

[GitHub] [arrow] Fokko opened a new issue, #34098: Fix Dataset docstrings

2023-02-09 Thread via GitHub
Fokko opened a new issue, #34098: URL: https://github.com/apache/arrow/issues/34098 ### Describe the bug, including details regarding any error messages, version, and platform. Many `Dataset.from_{table,fragment,batches}` don't have a docstring, or they refer to `Scanner.from_table`.

[GitHub] [arrow-adbc] lidavidm opened a new issue, #448: [CI] Nightly verification job should `git tag -f`

2023-02-09 Thread via GitHub
lidavidm opened a new issue, #448: URL: https://github.com/apache/arrow-adbc/issues/448 https://github.com/apache/arrow-adbc/actions/runs/4129642169/jobs/7135468048 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [arrow] orianED opened a new issue, #34101: [Go] pqarrow.NewSchemaManifest creates wrong schema field for array object fields

2023-02-09 Thread via GitHub
orianED opened a new issue, #34101: URL: https://github.com/apache/arrow/issues/34101 ### Describe the bug, including details regarding any error messages, version, and platform. I'm using the arrow V11 package for reading parquet files. Calling `pqarrow.NewFileReader` creates

[GitHub] [arrow-adbc] lidavidm closed issue #445: [Release] Verification script wheel issues

2023-02-09 Thread via GitHub
lidavidm closed issue #445: [Release] Verification script wheel issues URL: https://github.com/apache/arrow-adbc/issues/445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [arrow-adbc] lidavidm opened a new issue, #449: [Release] Add option to 06-binary-verify.sh to run the workflow from a different branch

2023-02-09 Thread via GitHub
lidavidm opened a new issue, #449: URL: https://github.com/apache/arrow-adbc/issues/449 Since we use the verification script in the checkout not in the tarball -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [arrow-adbc] lidavidm opened a new issue, #450: [Release] Release vote email template includes broken link to closed issues

2023-02-09 Thread via GitHub
lidavidm opened a new issue, #450: URL: https://github.com/apache/arrow-adbc/issues/450 The filter isn't quite right. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [arrow] westonpace closed issue #34088: [Python] Typo in get_writer

2023-02-09 Thread via GitHub
westonpace closed issue #34088: [Python] Typo in get_writer URL: https://github.com/apache/arrow/issues/34088 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

[GitHub] [arrow] dxe4 opened a new issue, #34104: deduplicate doesn't match docs

2023-02-09 Thread via GitHub
dxe4 opened a new issue, #34104: URL: https://github.com/apache/arrow/issues/34104 ### Describe the bug, including details regarding any error messages, version, and platform. in `pyarrow/array.pxi` deduplicate_objects has a default value set to true but the docs say its false.

[GitHub] [arrow] thisisnic opened a new issue, #34105: [R] Provide extra output for failed builds

2023-02-09 Thread via GitHub
thisisnic opened a new issue, #34105: URL: https://github.com/apache/arrow/issues/34105 ### Describe the enhancement requested When building the R package and the Arrow C++ library build fails, there's very little output unless the user has set `ARROW_R_DEV` to `TRUE`. We could inst

[GitHub] [arrow] westonpace closed issue #28074: [C++][Dataset] Handle NaNs correctly in Parquet predicate push-down

2023-02-09 Thread via GitHub
westonpace closed issue #28074: [C++][Dataset] Handle NaNs correctly in Parquet predicate push-down URL: https://github.com/apache/arrow/issues/28074 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] wjones127 closed issue #34080: [Python] Remove warning "Python binding for RoundBinaryOptions not exposed"

2023-02-09 Thread via GitHub
wjones127 closed issue #34080: [Python] Remove warning "Python binding for RoundBinaryOptions not exposed" URL: https://github.com/apache/arrow/issues/34080 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] wjones127 closed issue #34078: [C++][Parquet] Clean up BloomFilter API

2023-02-09 Thread via GitHub
wjones127 closed issue #34078: [C++][Parquet] Clean up BloomFilter API URL: https://github.com/apache/arrow/issues/34078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [arrow] wgtmac opened a new issue, #34106: [C++][Parquet] Fix

2023-02-09 Thread via GitHub
wgtmac opened a new issue, #34106: URL: https://github.com/apache/arrow/issues/34106 ### Describe the bug, including details regarding any error messages, version, and platform. Commit for this issue https://github.com/apache/arrow/issues/15042 has fixed the missing statistics, but i

[GitHub] [arrow] thisisnic opened a new issue, #34110: [R] Local checkouts look for incorrect version numbers

2023-02-09 Thread via GitHub
thisisnic opened a new issue, #34110: URL: https://github.com/apache/arrow/issues/34110 ### Describe the bug, including details regarding any error messages, version, and platform. In #13622, the capability to use artifactory for libarrow binaries was added. Logic was added to trunc

[GitHub] [arrow] thisisnic closed issue #34110: [R] Local checkouts look for incorrect version numbers

2023-02-09 Thread via GitHub
thisisnic closed issue #34110: [R] Local checkouts look for incorrect version numbers URL: https://github.com/apache/arrow/issues/34110 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow] kou opened a new issue, #34111: [CI][Packaging][Java] java-jars was broken by updating bundled aws-sdk-cpp

2023-02-09 Thread via GitHub
kou opened a new issue, #34111: URL: https://github.com/apache/arrow/issues/34111 ### Describe the bug, including details regarding any error messages, version, and platform. #33808 broke java-jars job: https://github.com/apache/arrow/pull/33808#issuecomment-1424873183

[GitHub] [arrow] kou closed issue #34074: [GLib][FlightRPC] Add `gaflight_client_authenticate_basic_token()`

2023-02-09 Thread via GitHub
kou closed issue #34074: [GLib][FlightRPC] Add `gaflight_client_authenticate_basic_token()` URL: https://github.com/apache/arrow/issues/34074 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow] kou closed issue #34094: [C++] Boost minimum version needs increasing for clang16 builds

2023-02-09 Thread via GitHub
kou closed issue #34094: [C++] Boost minimum version needs increasing for clang16 builds URL: https://github.com/apache/arrow/issues/34094 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [arrow] mapleFU opened a new issue, #34113: [C++] Upgrade zstd to v1.5.4

2023-02-09 Thread via GitHub
mapleFU opened a new issue, #34113: URL: https://github.com/apache/arrow/issues/34113 ### Describe the enhancement requested Zstd has release it's latest version, v1.5.4, which containing a lots of performance improvement under x86 and aarch64. The release note can be seen here: http

[GitHub] [arrow-flight-sql-postgresql] kou closed issue #10: Add support for opening a database

2023-02-09 Thread via GitHub
kou closed issue #10: Add support for opening a database URL: https://github.com/apache/arrow-flight-sql-postgresql/issues/10 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [arrow] thisisnic opened a new issue, #34115: [R][CI] R binary builds fail with package version numbers with 4 levels (e.g. 11.0.0.1)

2023-02-09 Thread via GitHub
thisisnic opened a new issue, #34115: URL: https://github.com/apache/arrow/issues/34115 ### Describe the bug, including details regarding any error messages, version, and platform. Changes in #14396 updated the logic for how the paths were constructed on the builds where the Arrow C+

[GitHub] [arrow] ktf opened a new issue, #34117: Extend cast operators for int8

2023-02-10 Thread via GitHub
ktf opened a new issue, #34117: URL: https://github.com/apache/arrow/issues/34117 ### Describe the enhancement requested In our analysis, we have the need to produce int8 data which can then optionally be processed as int32 or float32. ### Component(s) C++ - Gandiva --

[GitHub] [arrow] pulkomandy closed issue #34093: [C++] Cross compiling in Yocto is not working

2023-02-10 Thread via GitHub
pulkomandy closed issue #34093: [C++] Cross compiling in Yocto is not working URL: https://github.com/apache/arrow/issues/34093 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] wence- opened a new issue, #34118: Allow configuration of size of AWS event loop thread pool

2023-02-10 Thread via GitHub
wence- opened a new issue, #34118: URL: https://github.com/apache/arrow/issues/34118 ### Describe the enhancement requested When calling `DoInitializeS3`, arrow creates initialises the AWS API, which by default creates a thread pool for the background AWS event loop that uses one thr

[GitHub] [arrow] DanTm99 opened a new issue, #34119: Add [] operator to Schema

2023-02-10 Thread via GitHub
DanTm99 opened a new issue, #34119: URL: https://github.com/apache/arrow/issues/34119 ### Describe the enhancement requested Add the `[]` operator to `Schema` which calls `GetFieldByIndex` or `GetFieldByName`. ### Component(s) C# -- This is an automated message from t

[GitHub] [arrow] nbro10 opened a new issue, #34120: Cannot install pyarrow in MacOS Monterey (12.5.1) with M1 in a Python 3.7.13

2023-02-10 Thread via GitHub
nbro10 opened a new issue, #34120: URL: https://github.com/apache/arrow/issues/34120 ### Describe the usage question you have. Please include as many useful details as possible. I created and activated a Python 3.7.13 virtual environment. I am managing my Python versions with `p

[GitHub] [arrow] Kodiologist opened a new issue, #34121: Allow converting strings to dates without using datetimes as an intermediate step

2023-02-10 Thread via GitHub
Kodiologist opened a new issue, #34121: URL: https://github.com/apache/arrow/issues/34121 ### Describe the enhancement requested ``` import pyarrow as pa, pyarrow.compute as C x = pyarrow.array(['2008-01-01', '2008-01-02', '2008-01-03']) ``` This works fine: ```

[GitHub] [arrow] westonpace opened a new issue, #34122: [C++] Use special URI to allow calling UDFs via Substrait

2023-02-10 Thread via GitHub
westonpace opened a new issue, #34122: URL: https://github.com/apache/arrow/issues/34122 ### Describe the enhancement requested We should key on a special URI (e.g. https://apache.org/arrow/udf) to recognize that a Substrait call is actually looking for an Acero registered UDF. Then

[GitHub] [arrow] westonpace opened a new issue, #34123: [Python] Expose nested function registries

2023-02-10 Thread via GitHub
westonpace opened a new issue, #34123: URL: https://github.com/apache/arrow/issues/34123 ### Describe the enhancement requested We have the ability to created nested function registries in C++. This is useful to allow UDFs to be scoped to a query. We should expose this feature in p

[GitHub] [arrow] wjones127 closed issue #33115: [C++] Parquet support read page with crc32 checking

2023-02-10 Thread via GitHub
wjones127 closed issue #33115: [C++] Parquet support read page with crc32 checking URL: https://github.com/apache/arrow/issues/33115 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [arrow] wjones127 closed issue #34086: [C++][Parquet] Parquet V2 page headers have incorrect number of rows

2023-02-10 Thread via GitHub
wjones127 closed issue #34086: [C++][Parquet] Parquet V2 page headers have incorrect number of rows URL: https://github.com/apache/arrow/issues/34086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] paleolimbot closed issue #33904: [R] Using s3_bucket with non-AWS S3-compatible storage is confusing

2023-02-10 Thread via GitHub
paleolimbot closed issue #33904: [R] Using s3_bucket with non-AWS S3-compatible storage is confusing URL: https://github.com/apache/arrow/issues/33904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] assignUser closed issue #29773: link error on ubuntu

2023-02-10 Thread via GitHub
assignUser closed issue #29773: link error on ubuntu URL: https://github.com/apache/arrow/issues/29773 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: i

[GitHub] [arrow] westonpace closed issue #33899: [C++] Add `NamedTapRel` relation as a Substrait extension

2023-02-10 Thread via GitHub
westonpace closed issue #33899: [C++] Add `NamedTapRel` relation as a Substrait extension URL: https://github.com/apache/arrow/issues/33899 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [arrow] assignUser opened a new issue, #34129: [Dev][Release] Add all bundled dependencies to artifactory mirror

2023-02-10 Thread via GitHub
assignUser opened a new issue, #34129: URL: https://github.com/apache/arrow/issues/34129 ### Describe the enhancement requested While updating the artifactory mirror of bundled dependencies I noticed that a bunch of new dependencies do not have the cmake logic that allows them to fal

[GitHub] [arrow] assignUser opened a new issue, #34130: [Dev][C++] Don't use GitHub archive files with checksums

2023-02-10 Thread via GitHub
assignUser opened a new issue, #34130: URL: https://github.com/apache/arrow/issues/34130 ### Describe the bug, including details regarding any error messages, version, and platform. Recently it became apparent that the often used github archive links are not hash stable https://gith

[GitHub] [arrow] assignUser opened a new issue, #34131: [CI] Use artifactory mirror for bundled dependencies in CI job

2023-02-10 Thread via GitHub
assignUser opened a new issue, #34131: URL: https://github.com/apache/arrow/issues/34131 ### Describe the enhancement requested We should use the bundled dependencies from the artifactory mirror in at least on nightly build so we see when there are issues e.g. missing new versions.

[GitHub] [arrow] assignUser opened a new issue, #34132: [Dev] Add script to keep artifactory mirror of bundled dependencies in sync

2023-02-10 Thread via GitHub
assignUser opened a new issue, #34132: URL: https://github.com/apache/arrow/issues/34132 ### Describe the enhancement requested At this point we have to manually get the dependencies and upload them to jfrog. THere should be a script to automate this. ### Component(s) De

[GitHub] [arrow] westonpace opened a new issue, #34135: [C++] Parallel asof join node

2023-02-10 Thread via GitHub
westonpace opened a new issue, #34135: URL: https://github.com/apache/arrow/issues/34135 ### Describe the enhancement requested Now that we are starting to introduce formal ordering we can create an AsofJoinNode variant that works even if use_threads is true. A rough overview of the

[GitHub] [arrow] westonpace opened a new issue, #34136: [C++] Add the concept of "ordering" to an exec node, reject non-sensible plans

2023-02-10 Thread via GitHub
westonpace opened a new issue, #34136: URL: https://github.com/apache/arrow/issues/34136 ### Describe the enhancement requested Every node has an "ordering" which describes what the batch index of the batches produced by that node corresponds to. Source nodes will generally hav

[GitHub] [arrow] westonpace closed issue #34059: [C++] Create a fetch node based on a batch index property

2023-02-10 Thread via GitHub
westonpace closed issue #34059: [C++] Create a fetch node based on a batch index property URL: https://github.com/apache/arrow/issues/34059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [arrow] wgtmac opened a new issue, #34138: [C++][Parquet] Fix parsing stats from min_value/max_value

2023-02-10 Thread via GitHub
wgtmac opened a new issue, #34138: URL: https://github.com/apache/arrow/issues/34138 ### Describe the bug, including details regarding any error messages, version, and platform. The code below does not check and read from stats.min_value/max_value. If reading from a parquet file wher

[GitHub] [arrow] wgtmac opened a new issue, #34139: [C++][Parquet] Ignore corrupted or invalid statistics

2023-02-10 Thread via GitHub
wgtmac opened a new issue, #34139: URL: https://github.com/apache/arrow/issues/34139 ### Describe the bug, including details regarding any error messages, version, and platform. https://github.com/apache/arrow/pull/34112 fixes reading from stats.min_value and stats.max_value where ap

[GitHub] [arrow] assignUser closed issue #18865: [C++][Build] Cannot build with Parquet/Thrift support on CentOS 7

2023-02-10 Thread via GitHub
assignUser closed issue #18865: [C++][Build] Cannot build with Parquet/Thrift support on CentOS 7 URL: https://github.com/apache/arrow/issues/18865 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow] assignUser closed issue #29741: [C++][Docs][Parquet] Trouble installing on Cent OS 7

2023-02-10 Thread via GitHub
assignUser closed issue #29741: [C++][Docs][Parquet] Trouble installing on Cent OS 7 URL: https://github.com/apache/arrow/issues/29741 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [arrow] Shaheer-Ahmd opened a new issue, #34141: Unable to send email to mailing list.

2023-02-11 Thread via GitHub
Shaheer-Ahmd opened a new issue, #34141: URL: https://github.com/apache/arrow/issues/34141 ### Describe the bug, including details regarding any error messages, version, and platform. Sending an email to `d...@arrow.apache.org` gives the following error. ![image](https://user-imag

[GitHub] [arrow] wgtmac opened a new issue, #34142: [C++][Parquet] ColumnWriter avoids splitting a record into different pages

2023-02-11 Thread via GitHub
wgtmac opened a new issue, #34142: URL: https://github.com/apache/arrow/issues/34142 ### Describe the enhancement requested For now ColumnWriter determines page boundary solely based on the buffered page size. This can lead to a record being split into different pages. Although writi

[GitHub] [arrow] phofl opened a new issue, #34143: DOC: fill_null not showing in pyarrow API reference

2023-02-11 Thread via GitHub
phofl opened a new issue, #34143: URL: https://github.com/apache/arrow/issues/34143 ### Describe the bug, including details regarding any error messages, version, and platform. Can't sign up for Jira, hence I opened the issue here. ``fill_null`` was removed in 909379516a3e57516

[GitHub] [arrow] cboettig opened a new issue, #34145: [Python] Specifying schema does not prevent arrow from reading metadata on every single parquet?

2023-02-11 Thread via GitHub
cboettig opened a new issue, #34145: URL: https://github.com/apache/arrow/issues/34145 ### Describe the bug, including details regarding any error messages, version, and platform. Consider the following reprex, in which we open a partitioned parquet dataset on a remote S3 bucket:

[GitHub] [arrow] kou closed issue #34113: [C++] Upgrade zstd to v1.5.4

2023-02-11 Thread via GitHub
kou closed issue #34113: [C++] Upgrade zstd to v1.5.4 URL: https://github.com/apache/arrow/issues/34113 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow] kou closed issue #34143: [Python[Docs] fill_null not showing in pyarrow API reference

2023-02-11 Thread via GitHub
kou closed issue #34143: [Python[Docs] fill_null not showing in pyarrow API reference URL: https://github.com/apache/arrow/issues/34143 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow] mapleFU opened a new issue, #34147: [C++][Parquet] Support Crc32 write and verify for DICT_PAGE

2023-02-11 Thread via GitHub
mapleFU opened a new issue, #34147: URL: https://github.com/apache/arrow/issues/34147 ### Describe the enhancement requested This issue is part of https://issues.apache.org/jira/browse/ARROW-17904 . Previously, we support crc32 for DATA_PAGE_V1, in this patch we need to support DICT_

[GitHub] [arrow-flight-sql-postgresql] kou closed issue #15: Add support for session timeout

2023-02-11 Thread via GitHub
kou closed issue #15: Add support for session timeout URL: https://github.com/apache/arrow-flight-sql-postgresql/issues/15 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [arrow] mapleFU opened a new issue, #34148: [C++] Minimal build failed in crossbow because of zstd v1.5.4 requires CMake v3.18

2023-02-11 Thread via GitHub
mapleFU opened a new issue, #34148: URL: https://github.com/apache/arrow/issues/34148 ### Describe the bug, including details regarding any error messages, version, and platform. After patch https://github.com/apache/arrow/pull/34114 is merged, crossbow CI was broken: https://github

[GitHub] [arrow] rtpsw opened a new issue, #34150: [C++] [Python] Fix improper initialization of `ConversionOptions`

2023-02-12 Thread via GitHub
rtpsw opened a new issue, #34150: URL: https://github.com/apache/arrow/issues/34150 ### Describe the bug, including details regarding any error messages, version, and platform. In a debugging session, I observed that `ConversionOptions` being passed from `_substrait.pyx` was improper

[GitHub] [arrow] chenrui333 opened a new issue, #34151: apache-arrow 11.0.0 build failure

2023-02-12 Thread via GitHub
chenrui333 opened a new issue, #34151: URL: https://github.com/apache/arrow/issues/34151 ### Describe the bug, including details regarding any error messages, version, and platform. Currently, apache-arrow 11.0.0 failed to build on osx. relates to https://github.com/Homebrew/ho

[GitHub] [arrow] coady opened a new issue, #34153: [C++][Python] Binary search for sorted tables.

2023-02-12 Thread via GitHub
coady opened a new issue, #34153: URL: https://github.com/apache/arrow/issues/34153 ### Describe the enhancement requested While I support the decision for tables to not have index columns, I have still found it useful to enable fast binary search when it's known that a table is sort

[GitHub] [arrow-adbc] wjones127 opened a new issue, #453: [C][Sqlite] entries field in get_info result is nullable

2023-02-12 Thread via GitHub
wjones127 opened a new issue, #453: URL: https://github.com/apache/arrow-adbc/issues/453 We make the keys non-nullable, but not the entries field. https://github.com/apache/arrow-adbc/blob/317e47fde750c4f80f104a2b7a04ce8d33cb23ef/c/driver/sqlite/sqlite.c#L299 It is required to

[GitHub] [arrow] Fokko opened a new issue, #34154: [Python] Add `is_nan` for Python

2023-02-12 Thread via GitHub
Fokko opened a new issue, #34154: URL: https://github.com/apache/arrow/issues/34154 ### Describe the enhancement requested We currently use `field.is_null(nan_is_null=True) & field.is_valid()` for filtering on NaN values, but would be great to fall back to `is_nan`. The abovementione

[GitHub] [arrow] thisisnic opened a new issue, #34155: [R] BugReports field in DESCRIPTION file needs updating to point to GH Issues

2023-02-12 Thread via GitHub
thisisnic opened a new issue, #34155: URL: https://github.com/apache/arrow/issues/34155 ### Describe the bug, including details regarding any error messages, version, and platform. The BugReports field currently points to Jira but should be updated to point to GH Issues instead

[GitHub] [arrow] thisisnic closed issue #34155: [R] BugReports field in DESCRIPTION file needs updating to point to GH Issues

2023-02-12 Thread via GitHub
thisisnic closed issue #34155: [R] BugReports field in DESCRIPTION file needs updating to point to GH Issues URL: https://github.com/apache/arrow/issues/34155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] js8544 opened a new issue, #34157: [C++] Configure bundled AWS SDK to use aws-lc instead of OpenSSL

2023-02-13 Thread via GitHub
js8544 opened a new issue, #34157: URL: https://github.com/apache/arrow/issues/34157 ### Describe the enhancement requested Using OpenSSL causes various issues like https://github.com/apache/arrow/pull/33808#issuecomment-1408247269 and https://github.com/apache/arrow/issues/34111. We

[GitHub] [arrow] AlenkaF opened a new issue, #34160: [Docs][Release] Multiple copies/versions of versionwarning.js

2023-02-13 Thread via GitHub
AlenkaF opened a new issue, #34160: URL: https://github.com/apache/arrow/issues/34160 ### Describe the bug, including details regarding any error messages, version, and platform. There are currently three versions of the `versionwarning.js` file in apache/arrow-site: - docs/_s

[GitHub] [arrow] nealrichardson closed issue #33960: [R] Output schema for aggregation is sometimes innacurate

2023-02-13 Thread via GitHub
nealrichardson closed issue #33960: [R] Output schema for aggregation is sometimes innacurate URL: https://github.com/apache/arrow/issues/33960 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] nealrichardson closed issue #33892: [R] Map `dplyr::n()` to `count_all` kernel

2023-02-13 Thread via GitHub
nealrichardson closed issue #33892: [R] Map `dplyr::n()` to `count_all` kernel URL: https://github.com/apache/arrow/issues/33892 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] Fokko opened a new issue, #34162: [Python] `is_null(nan_is_null=True)` does not work with only NaN's

2023-02-13 Thread via GitHub
Fokko opened a new issue, #34162: URL: https://github.com/apache/arrow/issues/34162 ### Describe the bug, including details regarding any error messages, version, and platform. I was working on some test-cases for the PyIceberg integration, and hit this edge case. When you have a fil

[GitHub] [arrow] zeroshade closed issue #34101: [Go] pqarrow.NewSchemaManifest creates wrong schema field for array object fields

2023-02-13 Thread via GitHub
zeroshade closed issue #34101: [Go] pqarrow.NewSchemaManifest creates wrong schema field for array object fields URL: https://github.com/apache/arrow/issues/34101 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [arrow] NoahFournier opened a new issue, #34163: [C++][CI] Typo in build_orc CMake macro

2023-02-13 Thread via GitHub
NoahFournier opened a new issue, #34163: URL: https://github.com/apache/arrow/issues/34163 ### Describe the bug, including details regarding any error messages, version, and platform. I've found a typo in the build_orc macro in the ThirdPartyToolchain, which means that the orc build

[GitHub] [arrow] zeroshade closed issue #34077: [Go] Implement RunEndEncoded scalar

2023-02-13 Thread via GitHub
zeroshade closed issue #34077: [Go] Implement RunEndEncoded scalar URL: https://github.com/apache/arrow/issues/34077 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

[GitHub] [arrow] AlenkaF opened a new issue, #34165: [Python] Extension array data type should default to the storage type if to_pandas_dtype is not implemented

2023-02-13 Thread via GitHub
AlenkaF opened a new issue, #34165: URL: https://github.com/apache/arrow/issues/34165 ### Describe the bug, including details regarding any error messages, version, and platform. When working on the extension type for tensors in PyArrow I came across a behaviour of the conversion to

[GitHub] [arrow] egillax opened a new issue, #34166: [R] int64 not preserved when calling dplyr::collect

2023-02-13 Thread via GitHub
egillax opened a new issue, #34166: URL: https://github.com/apache/arrow/issues/34166 ### Describe the bug, including details regarding any error messages, version, and platform. When collecting arrow tables with 64 bit integer columns the column is converted to 32 bit integer. In th

[GitHub] [arrow] zeroshade opened a new issue, #34171: [Go][Compute] Add kernel for "unique" function

2023-02-13 Thread via GitHub
zeroshade opened a new issue, #34171: URL: https://github.com/apache/arrow/issues/34171 ### Describe the enhancement requested Following up on #33466, in order to implement direct and efficient handling of dictionary arrays to/from parquet without having to expand them out, we first

[GitHub] [arrow] mroeschke opened a new issue, #34173: [Python]. Allow pyarrow.compute.mode to include null count

2023-02-13 Thread via GitHub
mroeschke opened a new issue, #34173: URL: https://github.com/apache/arrow/issues/34173 ### Describe the enhancement requested There is a `skip_nulls` argument to dictate whether nulls should make the result null or be skipped, but it would be potentially be useful for `mode` to retu

[GitHub] [arrow-adbc] lidavidm opened a new issue, #454: [Python] Add __del__ to DBAPI objects

2023-02-13 Thread via GitHub
lidavidm opened a new issue, #454: URL: https://github.com/apache/arrow-adbc/issues/454 Just for convenience/ease of use. Optionally we can have this emit a warning to aid debuggability? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] lidavidm opened a new issue, #34174: [Docs][Release] Add 'tweet out the blog post' as a post-release task

2023-02-13 Thread via GitHub
lidavidm opened a new issue, #34174: URL: https://github.com/apache/arrow/issues/34174 ### Describe the enhancement requested We tend to forget to do this. While Twitter has been shaky recently, it's still used quite a bit, and it would be good to promote new releases. What do people

[GitHub] [arrow] james-camacho-ab closed issue #12892: [R] Arrow install on Databricks cluster takes 10+ minutes

2023-02-13 Thread via GitHub
james-camacho-ab closed issue #12892: [R] Arrow install on Databricks cluster takes 10+ minutes URL: https://github.com/apache/arrow/issues/12892 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] lidavidm opened a new issue, #34175: [Docs] .github/CONTRIBUTING.md still references Jira

2023-02-13 Thread via GitHub
lidavidm opened a new issue, #34175: URL: https://github.com/apache/arrow/issues/34175 ### Describe the bug, including details regarding any error messages, version, and platform. This should be updated to reflect that we now use GitHub Issues. This one is important since it ap

[GitHub] [arrow] wjones127 closed issue #15231: [Benchmarking][C++] Track memory usage in C++ microbenchmarks

2023-02-13 Thread via GitHub
wjones127 closed issue #15231: [Benchmarking][C++] Track memory usage in C++ microbenchmarks URL: https://github.com/apache/arrow/issues/15231 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [arrow] ianmcook closed issue #34166: [R] int64 not preserved when calling dplyr::collect

2023-02-13 Thread via GitHub
ianmcook closed issue #34166: [R] int64 not preserved when calling dplyr::collect URL: https://github.com/apache/arrow/issues/34166 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [arrow] felipecrv opened a new issue, #34176: Finish basic Run-End Encoded arrays support in C++

2023-02-13 Thread via GitHub
felipecrv opened a new issue, #34176: URL: https://github.com/apache/arrow/issues/34176 ### Describe the enhancement requested C++ related issues that are sub-tasks of #32104 that haven't been fixed by #33641. - [ ] #32105 - [ ] #32107 - [ ] #20351 - [ ] #32773

[GitHub] [arrow] raulcd closed issue #34023: [Docs] Version warning about viewing old docs doesn't work for versions >= 10

2023-02-14 Thread via GitHub
raulcd closed issue #34023: [Docs] Version warning about viewing old docs doesn't work for versions >= 10 URL: https://github.com/apache/arrow/issues/34023 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] jorisvandenbossche closed issue #34104: [Python][Docs] deduplicate doesn't match docs

2023-02-14 Thread via GitHub
jorisvandenbossche closed issue #34104: [Python][Docs] deduplicate doesn't match docs URL: https://github.com/apache/arrow/issues/34104 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow-adbc] paleolimbot opened a new issue, #455: Error building go driver manager (no C++ standard set)

2023-02-14 Thread via GitHub
paleolimbot opened a new issue, #455: URL: https://github.com/apache/arrow-adbc/issues/455 In verifying the release candidate I got I'm on MacOS M1 (no conda) and have run into this before...there must be at least some compiler default which is very picky about setting the

[GitHub] [arrow-adbc] paleolimbot opened a new issue, #456: Test failure when verifying Python release candidate

2023-02-14 Thread via GitHub
paleolimbot opened a new issue, #456: URL: https://github.com/apache/arrow-adbc/issues/456 I get the following when running the release candidate verification: ``` === FAILURES =

[GitHub] [arrow] deanm0000 opened a new issue, #34180: Expose more metadata in pyarrow.parquet.ParquetFile.metadata

2023-02-14 Thread via GitHub
deanm opened a new issue, #34180: URL: https://github.com/apache/arrow/issues/34180 ### Describe the enhancement requested I'm not sure if this issue pertains to all implementations of arrow including pyarrow or just c++ but related to this https://github.com/apache/arrow/issues/

[GitHub] [arrow] zeroshade closed issue #34055: [Go][CI] Add test run in CI that uses `noasm` tag

2023-02-14 Thread via GitHub
zeroshade closed issue #34055: [Go][CI] Add test run in CI that uses `noasm` tag URL: https://github.com/apache/arrow/issues/34055 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [arrow] zeroshade closed issue #34171: [Go][Compute] Add kernel for "unique" function

2023-02-14 Thread via GitHub
zeroshade closed issue #34171: [Go][Compute] Add kernel for "unique" function URL: https://github.com/apache/arrow/issues/34171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

<    6   7   8   9   10   11   12   13   14   15   >