Re: [I] pyarrow.fs.HadoopFileSystem Usage Problems [arrow]

2024-11-30 Thread via GitHub
deep826 closed issue #41777: pyarrow.fs.HadoopFileSystem Usage Problems URL: https://github.com/apache/arrow/issues/41777 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[I] [C++][Parquet] Loading big parquet files leads to huge memory consumption [arrow]

2024-11-30 Thread via GitHub
KuczaRacza opened a new issue, #44890: URL: https://github.com/apache/arrow/issues/44890 ### Describe the bug, including details regarding any error messages, version, and platform. Loading big parquet files (23GB file) leads to high memory consumption, ending in OOM. Memory consumpt

[I] [R] Add option to ignore partition file names when querying an arrow open_dataset() ? [arrow]

2024-11-30 Thread via GitHub
JakeRuss opened a new issue, #44889: URL: https://github.com/apache/arrow/issues/44889 ### Describe the enhancement requested I have a dataset which is hosted on AWS S3 as hive partitioned parquet files. The data is written to S3 by a Python job via pandas with snappy compression and

Re: [I] [Dev][Release] Update test condition in utils-prepare.sh [arrow]

2024-11-30 Thread via GitHub
kou closed issue #44885: [Dev][Release] Update test condition in utils-prepare.sh URL: https://github.com/apache/arrow/issues/44885 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[I] User or username parameter? [arrow-java]

2024-11-30 Thread via GitHub
james-mesney opened a new issue, #434: URL: https://github.com/apache/arrow-java/issues/434 ### Describe the usage question you have. Please include as many useful details as possible. The page https://arrow.apache.org/docs/java/flight_sql_jdbc_driver.html lists "user" as the pa

[I] Segmentation fault creating new scanner in Java [arrow]

2024-11-30 Thread via GitHub
ejf-ibm opened a new issue, #44888: URL: https://github.com/apache/arrow/issues/44888 ### Describe the bug, including details regarding any error messages, version, and platform. I'm currently hitting a segfault when trying to read in AWS CUR 2.0 billing files, writing out partitione