Re: [I] BUG: segmentation faults in the presence of `sparse` optional dependency (within conda builds) [arrow]
h-vetinari closed issue #15018: BUG: segmentation faults in the presence of `sparse` optional dependency (within conda builds) URL: https://github.com/apache/arrow/issues/15018 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [C++] Test linkage error when googletest 1.15.0 is installed system wide despite bundling [arrow]
amoeba opened a new issue, #43400: URL: https://github.com/apache/arrow/issues/43400 ### Describe the bug, including details regarding any error messages, version, and platform. On macOS 14.5, I happened to upgrade my brew version of googletest from 1.14.0 to 1.15.0 and started seeing a test linkage error: My cmake command is: ``` cmake .. -GNinja -DARROW_ACERO=ON -DARROW_COMPUTE=ON -DARROW_CSV=ON \ -DARROW_DATASET=ON -DARROW_FILESYSTEM=ON -DARROW_FLIGHT=ON -DARROW_JSON=ON \ -DARROW_PARQUET=ON -DARROW_AZURE=ON -DARROW_S3=ON -DARROW_GCS=ON \ -DARROW_SUBSTRAIT=ON -DARROW_BUILD_TESTS=ON -DARROW_MIMALLOC=OFF \ -DARROW_WITH_BROTLI=ON -DARROW_WITH_BZ2=ON -DARROW_WITH_LZ4=ON \ -DARROW_WITH_SNAPPY=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON \ -DARROW_INSTALL_NAME_RPATH=OFF -DARROW_EXTRA_ERROR_CONTEXT=ON\ -DCMAKE_INSTALL_PREFIX=/Users/bryce/builds/arrow-arm64 -DCMAKE_BUILD_TYPE=Debug \ -DGTest_SOURCE=BUNDLED -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \ ``` When compiling, I get two linker errors (both similar to this one): ``` FAILED: debug/arrow-flight-test : && /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -fno-aligned-new -Qunused-arguments -fcolor-diagnostics -Wall -Wextra -Wdocumentation -DARROW_WARN_DOCUMENTATION -Wshorten-64-to-32 -Wno-missing-braces -Wno-unused-parameter -Wno-constant-logical-operand -Wno-return-stack-address -Wdate-time -Wno-unknown-warning-option -Wno-pass-failed -march=armv8-a -g -Werror -O0 -ggdb -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.5.sdk -Wl,-search_paths_first -Wl,-headerpad_max_install_names src/arrow/flight/CMakeFiles/arrow-flight-test.dir/flight_test.cc.o -o debug/arrow-flight-test -Wl,-rpath,/Users/bryce/src/apache/arrow/cpp/build/debug -Wl,-rpath,/opt/homebrew/lib debug/libarrow_flight_testing.1800.0.0.dylib debug/libarrow_testing.1800.0.0.dylib debug/libarrow_gmockd.1.11.0.dylib debug/libarrow_gtest_maind.1.11.0.dylib debug/libarrow_flight.1800.0.0.dylib /opt/homebr ew/lib/libgrpc++.1.62.2.dylib /opt/homebrew/lib/libgrpc.39.0.0.dylib /opt/homebrew/lib/libupb_json_lib.39.0.0.dylib /opt/homebrew/lib/libupb_textformat_lib.39.0.0.dylib /opt/homebrew/lib/libupb_message_lib.39.0.0.dylib /opt/homebrew/lib/libupb_base_lib.39.0.0.dylib /opt/homebrew/lib/libupb_mem_lib.39.0.0.dylib /opt/homebrew/lib/libutf8_range_lib.39.0.0.dylib /opt/homebrew/lib/libre2.11.0.0.dylib /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.5.sdk/usr/lib/libz.tbd /opt/homebrew/lib/libcares.2.17.2.dylib -lresolv /opt/homebrew/lib/libgpr.39.0.0.dylib /opt/homebrew/opt/openssl@3/lib/libssl.dylib /opt/homebrew/opt/openssl@3/lib/libcrypto.dylib /opt/homebrew/lib/libaddress_sorting.39.0.0.dylib -lm -framework CoreFoundation /opt/homebrew/lib/libprotobuf.27.1.0.dylib /opt/homebrew/lib/libabsl_log_internal_check_op.2401.0.0.dylib /opt/homebrew/lib/libabsl_leak_check.2401.0.0.dylib /opt/homebrew/lib/libabsl_die_if_null.2401. 0.0.dylib /opt/homebrew/lib/libabsl_log_internal_conditions.2401.0.0.dylib /opt/homebrew/lib/libabsl_log_internal_message.2401.0.0.dylib /opt/homebrew/lib/libabsl_log_internal_nullguard.2401.0.0.dylib /opt/homebrew/lib/libabsl_examine_stack.2401.0.0.dylib /opt/homebrew/lib/libabsl_log_internal_format.2401.0.0.dylib /opt/homebrew/lib/libabsl_log_internal_proto.2401.0.0.dylib /opt/homebrew/lib/libabsl_log_internal_log_sink_set.2401.0.0.dylib /opt/homebrew/lib/libabsl_log_sink.2401.0.0.dylib /opt/homebrew/lib/libabsl_log_entry.2401.0.0.dylib /opt/homebrew/lib/libabsl_flags_internal.2401.0.0.dylib /opt/homebrew/lib/libabsl_flags_marshalling.2401.0.0.dylib /opt/homebrew/lib/libabsl_flags_reflection.2401.0.0.dylib /opt/homebrew/lib/libabsl_flags_config.2401.0.0.dylib /opt/homebrew/lib/libabsl_flags_program_name.2401.0.0.dylib /opt/homebrew/lib/libabsl_flags_private_handle_accessor.2401.0.0.dylib /opt/homebrew/lib/libabsl_flags_commandlineflag.2401.0.0.dylib /opt/homebrew/ lib/libabsl_flags_commandlineflag_internal.2401.0.0.dylib /opt/homebrew/lib/libabsl_log_initialize.2401.0.0.dylib /opt/homebrew/lib/libabsl_log_internal_globals.2401.0.0.dylib /opt/homebrew/lib/libabsl_log_globals.2401.0.0.dylib /opt/homebrew/lib/libabsl_vlog_config_internal.2401.0.0.dylib /opt/homebrew/lib/libabsl_log_internal_fnmatch.2401.0.0.dylib /opt/homebrew/lib/libabsl_raw_hash_set.2401.0.0.dylib /opt/homebrew/lib/libabsl_hash.2401.0.0.dylib /opt/homebrew/lib/libabsl_city.2401.0.0.dylib /opt/homebrew/lib/libabsl_low_level_hash.2401.0.0.dylib /opt/homebrew/lib/libabsl_hashtablez_sampler.2401.0.0.dylib /opt/homebrew/lib/libabsl_random_distributions.2401.0.0.dylib /opt/homebrew/lib/libabsl_random_seed_sequences.2401.0.0.dylib /opt/homebrew/lib/libabsl_random_in
[I] Add CI jobs for windows aarch64 [arrow]
jonkeane opened a new issue, #43401: URL: https://github.com/apache/arrow/issues/43401 ### Describe the enhancement requested [R 4.4 has experimental support for windows on aarch64](https://blog.r-project.org/2024/04/23/r-on-64-bit-arm-windows/index.html) we should setup a CI job to confirm that the arrow package builds here. ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Java] Remove use of jsr305 [arrow]
lidavidm closed issue #43396: [Java] Remove use of jsr305 URL: https://github.com/apache/arrow/issues/43396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] c/driver/postgresql: Connection.adbc_get_table_schema() does not respect column order [arrow-adbc]
lidavidm closed issue #2006: c/driver/postgresql: Connection.adbc_get_table_schema() does not respect column order URL: https://github.com/apache/arrow-adbc/issues/2006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] c/driver/postgresql: Connection.adbc_get_table_schema() does not respect column order [arrow-adbc]
lidavidm closed issue #2006: c/driver/postgresql: Connection.adbc_get_table_schema() does not respect column order URL: https://github.com/apache/arrow-adbc/issues/2006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] The Go flightsql driver doesn't handle scanning LargeString or LargeBinary types [arrow]
phillipleblanc opened a new issue, #43403: URL: https://github.com/apache/arrow/issues/43403 ### Describe the bug, including details regarding any error messages, version, and platform. The Go flightsql database driver currently does not handle scanning string or byte values where the Arrow type is LargeString or LargeBinary. This code will fail with `type *array.LargeString: not supported` today ```go db, err := sql.Open("flightsql", "flightsql://some_endpoint") if err != nil { panic(err) } defer db.Close() row := db.QueryRow("SELECT string_value FROM table LIMIT 1") log.Println("Reading row") var string_value string if err := rows.Scan(&string_value); err != nil { // If `string_value` is a LargeString type, this will error } ``` ### Component(s) Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Progress bar for `read_feather` for `R` and a verbose version [arrow]
ajinkya-k opened a new issue, #43404: URL: https://github.com/apache/arrow/issues/43404 ### Describe the enhancement requested I would like to request a that a progress bar be shown when using `read_feather` function in `R` especially for large files, so that the user can see if the file is actually being read and progress is being read, similar to `data.table::fread` which shows a simple progress bar enabled using the `showProgress` argument in `fread`. I have a use case in which I am using `read_feather` to read a large file into `R` from a network drive, and there is no indication if `R` is even making progress on loading the file during some runs. In others it loads in ~300 seconds. `fread` also has a verbose option which dumps a lot more output, and would also be well worth implementing, but a progress bar at minimum would also be great! ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Could not read encrypted metadata via pq.read_table [arrow]
heyuqi1970 opened a new issue, #43406: URL: https://github.com/apache/arrow/issues/43406 ### Describe the bug, including details regarding any error messages, version, and platform. os: macos 11.7.10 (20G1427) python: 3.9.7 pyarrow: 16.0.0 when I use pq.read_table with decryption_properties parameter, I get the following errorγ And I can use pq.ParquetFile with decryption_properties to read same encrypted file. ``` Traceback (most recent call last): File "tt.py", line 98, in table = pq.read_table("yellow_cryp.parquet", memory_map=True, decryption_properties=decryption_properties) File "/venv/lib/python3.9/site-packages/pyarrow/parquet/core.py", line 1762, in read_table dataset = ParquetDataset( File "/venv/lib/python3.9/site-packages/pyarrow/parquet/core.py", line 1329, in __init__ [fragment], schema=schema or fragment.physical_schema, File "pyarrow/_dataset.pyx", line 1431, in pyarrow._dataset.Fragment.physical_schema.__get__ File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status OSError: Could not open Parquet input source 'yellow_cryp.parquet': Could not read encrypted metadata, no decryption found in reader's properties ``` ### Component(s) Parquet, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [C++] IO: InputStream::Advance will always read from Stream [arrow]
mapleFU opened a new issue, #43408: URL: https://github.com/apache/arrow/issues/43408 ### Describe the enhancement requested ```c++ class ARROW_EXPORT InputStream : virtual public FileInterface, virtual public Readable { public: /// \brief Advance or skip stream indicated number of bytes /// \param[in] nbytes the number to move forward /// \return Status Status Advance(int64_t nbytes); ``` ```c++ Status InputStream::Advance(int64_t nbytes) { return Read(nbytes).status(); } ``` `Advance` is always call read, since it's not a virtual function ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Java][Benchmarking] Java benchmarks are broken when running with Java 17+ [arrow]
danepitkin closed issue #43394: [Java][Benchmarking] Java benchmarks are broken when running with Java 17+ URL: https://github.com/apache/arrow/issues/43394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [R] r-arrow cannot be compiled with clang [arrow]
assignUser closed issue #43398: [R] r-arrow cannot be compiled with clang URL: https://github.com/apache/arrow/issues/43398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [Python]: Support PyCapsule Interface Objects as input in more places [arrow]
kylebarron opened a new issue, #43410: URL: https://github.com/apache/arrow/issues/43410 ### Describe the enhancement requested Now that the PyCapsule Interface is starting to gain more traction (https://github.com/apache/arrow/issues/39195), I think it would be great if some of pyarrow's functional APIs accepted any PyCapsule Interface object, and not _just_ pyarrow objects. Do people have opinions on what functions should or should not check for these objects? I'd argue that file format writers should check for them, because it's only a couple lines of code, and the input stream will be fully iterated over regardless. E.g. looking at the Parquet writer: the high level API doesn't currently accept a `RecordBatchReader` either, so support for both can come at the same time. ```py from dataclasses import dataclass from typing import Any import pyarrow as pa import pyarrow.parquet as pq @dataclass class ArrowCStream: obj: Any def __arrow_c_stream__(self, requested_schema=None): return self.obj.__arrow_c_stream__(requested_schema=requested_schema) table = pa.table({"a": [1, 2, 3, 4]}) pq.write_table(table, "test.parquet") # works reader = pa.RecordBatchReader.from_stream(table) pq.write_table(reader, "test.parquet") # fails pq.write_table(ArrowCStream(table), "test.parquet") # fails ``` I'd argue that the writer should be generalized to accept any object with an `__arrow_c_stream__` dunder, and to ensure the stream is not materialized as a table. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [Java][Benchmarking] Java benchmarks are still broken when running with Java 17+ [arrow]
danepitkin opened a new issue, #43412: URL: https://github.com/apache/arrow/issues/43412 ### Describe the bug, including details regarding any error messages, version, and platform. We didn't do the right testing and it turns out that this merged PR does not fully fix the Java benchmarks: https://github.com/apache/arrow/pull/43395 ### Component(s) Java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [C++][Compute] Invalid memory access when resizing row table [arrow]
zanmato1984 opened a new issue, #43414: URL: https://github.com/apache/arrow/issues/43414 ### Describe the bug, including details regarding any error messages, version, and platform. When resizing the underlying buffer for the var-length content of the row table, we do: https://github.com/apache/arrow/blob/674e221f41c602c8f71c7a2c8e53e7c7c11b1ede/cpp/src/arrow/compute/row/row_internal.cc#L296-L299 It is treating the second buffer (row content if the row table is fixed length, or offset otherwise) as offset regardless of the fix-length-ness. The fix-length-ness is checked afterwards, in which case resizing the var-length buffer is unnecessary and return. But treating the second buffer as offset unconditionally is problematic because, at least but not last, it could be sized less than required by an offset buffer. Consider a row table containing only one `uint8` column and alignment being `1` byte, there will be `1` byte per row, less than `4` bytes per row as an offset, causing the offset access beyond the buffer boundary. I have a repro case in my local and will send out as UT with my fix PR. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [CI][C++] Vcpkg failures building some wheels [arrow]
raulcd opened a new issue, #43416: URL: https://github.com/apache/arrow/issues/43416 ### Describe the bug, including details regarding any error messages, version, and platform. Some wheels have been failing for some days due to vcpkg failures: [wheel-macos-big-sur-cp310-arm64](https://github.com/ursacomputing/crossbow/actions/runs/10073275167/job/27846965111) [wheel-macos-big-sur-cp311-arm64](https://github.com/ursacomputing/crossbow/actions/runs/10073273602/job/27846957637) [wheel-macos-big-sur-cp312-arm64](https://github.com/ursacomputing/crossbow/actions/runs/10073274108/job/27846960846) [wheel-macos-big-sur-cp38-arm64](https://github.com/ursacomputing/crossbow/actions/runs/10073275488/job/27846966986) [wheel-macos-big-sur-cp39-arm64](https://github.com/ursacomputing/crossbow/actions/runs/10073275367/job/27846966545) [wheel-macos-catalina-cp310-amd64](https://github.com/ursacomputing/crossbow/actions/runs/10073273694/job/27846958347) [wheel-macos-catalina-cp311-amd64](https://github.com/ursacomputing/crossbow/actions/runs/10073273570/job/27846957447) [wheel-macos-catalina-cp312-amd64](https://github.com/ursacomputing/crossbow/actions/runs/10073274381/job/27846961100) [wheel-macos-catalina-cp38-amd64](https://github.com/ursacomputing/crossbow/actions/runs/10073273679/job/27846958339) [wheel-macos-catalina-cp39-amd64](https://github.com/ursacomputing/crossbow/actions/runs/10073275056/job/27846964797) [wheel-manylinux-2014-cp312-arm64](https://github.com/ursacomputing/crossbow/actions/runs/10073274856/job/27846963508) [wheel-manylinux-2014-cp38-arm64](https://github.com/ursacomputing/crossbow/actions/runs/10073273811/job/27846959167) An example of failure ``` /Users/runner/work/crossbow/crossbow/arrow/ci/vcpkg/arm64-osx-static-release.cmake: info: loaded overlay triplet from here -- Downloading https://github.com/abseil/abseil-cpp/archive/20240116.2.tar.gz -> abseil-abseil-cpp-20240116.2.tar.gz... -- Extracting source /Users/runner/work/crossbow/crossbow/vcpkg/downloads/abseil-abseil-cpp-20240116.2.tar.gz -- Using source at /Users/runner/work/crossbow/crossbow/vcpkg/buildtrees/abseil/src/20240116.2-eaa4a5f5c0.clean -- Found external ninja('1.12.1'). -- Configuring arm64-osx-static-release-rel CMake Error at scripts/cmake/vcpkg_execute_required_process.cmake:112 (message): Command failed: /opt/homebrew/Cellar/cmake/3.30.0/bin/cmake /Users/runner/work/crossbow/crossbow/vcpkg/buildtrees/abseil/src/20240116.2-eaa4a5f5c0.clean -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/Users/runner/work/crossbow/crossbow/vcpkg/packages/abseil_arm64-osx-static-release -DFETCHCONTENT_FULLY_DISCONNECTED=ON -DABSL_PROPAGATE_CXX_STD=ON -DCMAKE_MAKE_PROGRAM=/opt/homebrew/bin/ninja -DCMAKE_SYSTEM_NAME=Darwin -DBUILD_SHARED_LIBS=OFF -DVCPKG_CHAINLOAD_TOOLCHAIN_FILE=/Users/runner/work/crossbow/crossbow/vcpkg/scripts/toolchains/osx.cmake -DVCPKG_TARGET_TRIPLET=arm64-osx-static-release -DVCPKG_SET_CHARSET_FLAG=ON -DVCPKG_PLATFORM_TOOLSET=external -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_FIND_PACKAGE_NO_SYSTEM_PACKAGE_REGISTRY=ON -DCMAKE_INSTALL_SYSTEM_RUNTIME_LIBS_SKIP=TRUE -DCMAKE_VERBOSE_MAKEFILE=ON -DVCPKG_APPLOCAL_DEPS=OFF -DCMAKE_TOOLCHAIN_FILE=/Users/runner/work/crossbow/crossbow/vcpkg/scripts/buildsystems /vcpkg.cmake -DCMAKE_ERROR_ON_ABSOLUTE_INSTALL_DESTINATION=ON -DVCPKG_CXX_FLAGS= -DVCPKG_CXX_FLAGS_RELEASE= -DVCPKG_CXX_FLAGS_DEBUG= -DVCPKG_C_FLAGS= -DVCPKG_C_FLAGS_RELEASE= -DVCPKG_C_FLAGS_DEBUG= -DVCPKG_CRT_LINKAGE=dynamic -DVCPKG_LINKER_FLAGS= -DVCPKG_LINKER_FLAGS_RELEASE= -DVCPKG_LINKER_FLAGS_DEBUG= -DVCPKG_TARGET_ARCHITECTURE=arm64 -DCMAKE_INSTALL_LIBDIR:STRING=lib -DCMAKE_INSTALL_BINDIR:STRING=bin -D_VCPKG_ROOT_DIR=/Users/runner/work/crossbow/crossbow/vcpkg -D_VCPKG_INSTALLED_DIR=/Users/runner/work/crossbow/crossbow/vcpkg/installed -DVCPKG_MANIFEST_INSTALL=OFF -DCMAKE_OSX_DEPLOYMENT_TARGET=11.0 -DCMAKE_OSX_ARCHITECTURES=arm64 Working Directory: /Users/runner/work/crossbow/crossbow/vcpkg/buildtrees/abseil/arm64-osx-static-release-rel Error code: 1 See logs for more information: /Users/runner/work/crossbow/crossbow/vcpkg/buildtrees/abseil/config-arm64-osx-static-release-rel-CMakeCache.txt.log /Users/runner/work/crossbow/crossbow/vcpkg/buildtrees/abseil/config-arm64-osx-static-release-rel-out.log /Users/runner/work/crossbow/crossbow/vcpkg/buildtrees/abseil/config-arm64-osx-static-release-rel-err.log Call Stack (most recent call first): installed/arm64-osx/share/vcpkg-cmake/vcpkg_cmake_configure.cmake:280 (vcpkg_execute_required_process) ports/abseil/portfile.cmake:26 (vcpkg_cmake_configure) scripts/ports.cmake:175 (include) error: building abseil:arm64-osx-static-release failed with: BUILD_FAILED Elapsed time to handle abseil:arm64-osx-
Re: [I] [C++] Use signed offset type in row table related structures [arrow]
zanmato1984 closed issue #40020: [C++] Use signed offset type in row table related structures URL: https://github.com/apache/arrow/issues/40020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Java][Benchmarking] Java benchmarks are still broken when running with Java 17+ [arrow]
danepitkin closed issue #43412: [Java][Benchmarking] Java benchmarks are still broken when running with Java 17+ URL: https://github.com/apache/arrow/issues/43412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Java] Add support for JDK version cross testing [arrow]
danepitkin closed issue #43380: [Java] Add support for JDK version cross testing URL: https://github.com/apache/arrow/issues/43380 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [C++] Tests for the 'take' function don't exercise kernels handling chunked arrays very well [arrow]
felipecrv closed issue #43291: [C++] Tests for the 'take' function don't exercise kernels handling chunked arrays very well URL: https://github.com/apache/arrow/issues/43291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Python] Test FlightStreamReader iterator [arrow]
danepitkin closed issue #42085: [Python] Test FlightStreamReader iterator URL: https://github.com/apache/arrow/issues/42085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Swift] Add StructArray to ArrowReader [arrow]
kou closed issue #43169: [Swift] Add StructArray to ArrowReader URL: https://github.com/apache/arrow/issues/43169 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] go/adbc/driver/flightsql: long delay between createPreparedStatement and getFlightInfoPreparedStatement [arrow-adbc]
aiguofer opened a new issue, #2040: URL: https://github.com/apache/arrow-adbc/issues/2040 ### What happened? We have a Python API that uses the ADBC driver to execute queries against our Java Arrow Flight SQL server. The Python server uses FastAPI + Strawberry, and when we receive a request to execute a query, we spin up a background thread to handle the execution against the AFS server. Multiple threads on the same pod could be executing queries against the AFS server at any given moment. We recently noticed some issues with hanging queries, and when looking at our DataDog traces, we notice that there is almost a 30 minute difference between the `createPreparedStatement` request and the `getFlightInfoPreparedStatement` request. My initial guess is that this could be related to having multiple requests at the same time through the ADBC driver, but I don't have enough context about how the bindings between Go and Python work. Is there anything that jumps out at you? Is there anything we could do to help debug this? Here's pictures of the traces:   ### Stack Trace _No response_ ### How can we reproduce the bug? _No response_ ### Environment/Setup Python 3.12 ADBC FlightSQL driver 1.0.0 ADBC driver manager 1.0.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Use Amazon KMS for encryption having error: "OSError: Incorrect key to columns mapping in column keys property:" [arrow]
ChanTheDataExplorer opened a new issue, #43426: URL: https://github.com/apache/arrow/issues/43426 ### Describe the usage question you have. Please include as many useful details as possible. Hi, I'm trying to use the columnar encryption but having problem to make it work. Below is the sample code ``` KMS_KEY_ARN= 'arn:aws:kms:ap-southeast-1:643458469770:key/4c9195a3-bb54-40d3-b199-9f9bf6ea9dcf' FOOTER_KEY_NAME = "footer_key" COL_KEY_NAME = "column_key" table = pa.Table.from_pydict({ 'a': ['hello'], 'b': ['goodbye'], 'c': ['womp'] }) encryption_config = pe.EncryptionConfiguration( footer_key=KMS_KEY_ARN, column_keys={ KMS_KEY_ARN: ["a", "b", "c"], }, encryption_algorithm="AES_GCM_V1", cache_lifetime=timedelta(minutes=5.0), data_key_length_bits=256 ) kms_connection_config = pe.KmsConnectionConfig( custom_kms_conf={ FOOTER_KEY_NAME: FOOTER_KEY_ARN, COL_KEY_NAME: FOOTER_KEY_ARN, } ) crypto_factory = pe.CryptoFactory(kms_factory) file_encryption_properties = crypto_factory.file_encryption_properties( kms_connection_config, encryption_config) with pq.ParquetWriter(path, table.schema, encryption_properties=file_encryption_properties) as writer: writer.write_table(table) ``` ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [CI] Add wheels and java-jars to vcpkg group tasks [arrow]
assignUser closed issue #43418: [CI] Add wheels and java-jars to vcpkg group tasks URL: https://github.com/apache/arrow/issues/43418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Snowflake adbc_ingest reverting back to CSV uploading [arrow-adbc]
davlee1972 opened a new issue, #2041: URL: https://github.com/apache/arrow-adbc/issues/2041 ### What happened? I'll have more time to debug this next week, but adbc_ingest() is creating CSV files with Snowflake. I'm using the latest 1.1.0 ADBC drivers. Even when I convert my CSV files to Parquet and try calling adbc_ingest() it is sending data to Snowflake in CSV format.. I'll have to downgrade my drivers for further testing..  ### Stack Trace _No response_ ### How can we reproduce the bug? _No response_ ### Environment/Setup _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [C++][Parquet] Deprecate ColumnChunk::file_offset field [arrow]
mapleFU opened a new issue, #43427: URL: https://github.com/apache/arrow/issues/43427 ### Describe the enhancement requested https://github.com/apache/parquet-format/pull/440 ### Component(s) C++, Parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [JS] Rows will intermittently return BigInt data, instead of the expected strings [arrow]
Vectorrent closed issue #43275: [JS] Rows will intermittently return BigInt data, instead of the expected strings URL: https://github.com/apache/arrow/issues/43275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [C++][FlightRPC] Flight UCX build is failing [arrow]
felipecrv opened a new issue, #43429: URL: https://github.com/apache/arrow/issues/43429 ### Describe the bug, including details regarding any error messages, version, and platform. ``` ~/code/arrow/cpp/src/arrow/flight/transport/ucx/ucx_server.cc:261:33: error: implicit conversion loses integer precision: 'unsigned long' to 'socklen_t' (aka 'unsigned int') [-Werror,-Wshorten-64-to-32] params.sockaddr.addrlen = addrlen; ~ ^~~ ~/code/arrow/cpp/src/arrow/flight/transport/ucx/ucx_server.cc:379:40: error: no member named 'DoSerializeToString' in 'arrow::flight::FlightInfo'; did you mean 'SerializeToString'? SERVER_RETURN_NOT_OK(driver, info->DoSerializeToString(&response)); ^~~ SerializeToString ~/code/arrow/cpp/src/arrow/flight/transport/ucx/ucx_server.cc:55:26: note: expanded from macro 'SERVER_RETURN_NOT_OK' ::arrow::Status s = (status); \ ^ ~/code/arrow/cpp/src/arrow/flight/types.h:540:17: note: 'SerializeToString' declared here arrow::Status SerializeToString(std::string* out) const; ^ ~/code/arrow/cpp/src/arrow/flight/transport/ucx/ucx_server.cc:400:40: error: no member named 'DoSerializeToString' in 'arrow::flight::PollInfo'; did you mean 'SerializeToString'? SERVER_RETURN_NOT_OK(driver, info->DoSerializeToString(&response)); ^~~ SerializeToString ~/code/arrow/cpp/src/arrow/flight/transport/ucx/ucx_server.cc:55:26: note: expanded from macro 'SERVER_RETURN_NOT_OK' ::arrow::Status s = (status); \ ^ ~/code/arrow/cpp/src/arrow/flight/types.h:619:17: note: 'SerializeToString' declared here arrow::Status SerializeToString(std::string* out) const; ^ 3 errors generated. ``` ### Component(s) C++, FlightRPC -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Java][CI] Java-Jars CI is Failing with a linking error on macOS [arrow]
danepitkin closed issue #43377: [Java][CI] Java-Jars CI is Failing with a linking error on macOS URL: https://github.com/apache/arrow/issues/43377 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [Java][Packaging] java-jars failing on maven module [arrow]
danepitkin opened a new issue, #43432: URL: https://github.com/apache/arrow/issues/43432 ### Describe the bug, including details regarding any error messages, version, and platform. java-jars job is failing: https://github.com/ursacomputing/crossbow/actions/runs/10103282092/job/27945082133 ``` [FATAL] Non-readable POM /Users/runner/work/crossbow/crossbow/arrow/java/maven: /Users/runner/work/crossbow/crossbow/arrow/java/maven (No such file or directory) @ ``` ### Component(s) Java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [JS] Build warning due to missing arrow2csv.cjs file in bin [arrow]
trxcllnt closed issue #42229: [JS] Build warning due to missing arrow2csv.cjs file in bin URL: https://github.com/apache/arrow/issues/42229 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [JS] Build warning due to missing arrow2csv.cjs file in bin [arrow]
trxcllnt closed issue #42229: [JS] Build warning due to missing arrow2csv.cjs file in bin URL: https://github.com/apache/arrow/issues/42229 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [JS] Build fails in node v22 due to outdated `esm` package [arrow]
trxcllnt closed issue #43340: [JS] Build fails in node v22 due to outdated `esm` package URL: https://github.com/apache/arrow/issues/43340 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [JS] Build fails in node v22 due to outdated `esm` package [arrow]
trxcllnt closed issue #43340: [JS] Build fails in node v22 due to outdated `esm` package URL: https://github.com/apache/arrow/issues/43340 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [JS] `arrow2csv` bin entry has wrong extension [arrow]
trxcllnt closed issue #43341: [JS] `arrow2csv` bin entry has wrong extension URL: https://github.com/apache/arrow/issues/43341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [JS] `arrow2csv` bin entry has wrong extension [arrow]
trxcllnt closed issue #43341: [JS] `arrow2csv` bin entry has wrong extension URL: https://github.com/apache/arrow/issues/43341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [JS] When I install any apache-arrow version beyond 14 it throws error [arrow]
trxcllnt closed issue #41649: [JS] When I install any apache-arrow version beyond 14 it throws error URL: https://github.com/apache/arrow/issues/41649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [CI] Building C++ libraries on Ubuntu aarch64 job takes 3+ hours to complete in java-jars [arrow]
danepitkin opened a new issue, #43434: URL: https://github.com/apache/arrow/issues/43434 ### Describe the enhancement requested Can we speed this up? Other OS/architectures take 10-30min. See java-jars job in crossbow: https://github.com/ursacomputing/crossbow/actions/runs/9817940199/job/27109855273 ### Component(s) Continuous Integration -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Failure to read valid file [arrow-julia]
adienes opened a new issue, #511: URL: https://github.com/apache/arrow-julia/issues/511 both `pyarrow` and `polars` can read this table, but [mwe.arrow.zip](https://github.com/user-attachments/files/16395769/mwe.arrow.zip) ``` julia> Arrow.Table("mwe.arrow") 1-element ExceptionStack: TaskFailedException nested task error: MethodError: no method matching init(::Nothing, ::Vector{UInt8}, ::Int64) Closest candidates are: init(::Type{T}, ::Vector{UInt8}, ::Integer) where T<:Union{Arrow.FlatBuffers.Struct, Arrow.FlatBuffers.Table} @ Arrow ~/.julia/packages/Arrow/5pHqZ/src/FlatBuffers/table.jl:43 Stacktrace: [1] getproperty(x::Arrow.Flatbuf.Field, field::Symbol) @ Arrow.Flatbuf ~/.julia/packages/Arrow/5pHqZ/src/metadata/Schema.jl:542 [2] build(field::Arrow.Flatbuf.Field, batch::Arrow.Batch, rb::Arrow.Flatbuf.RecordBatch, de::Dict{Int64, Arrow.DictEncoding}, nodeidx::Int64, bufferidx::Int64, convert::Bool) @ Arrow ~/.julia/packages/Arrow/5pHqZ/src/table.jl:668 [3] iterate(x::Arrow.VectorIterator, ::Tuple{Int64, Int64, Int64}) @ Arrow ~/.julia/packages/Arrow/5pHqZ/src/table.jl:629 [4] copyto!(dest::Vector{Any}, src::Arrow.VectorIterator) @ Base ./abstractarray.jl:948 [5] _collect @ ./array.jl:765 [inlined] [6] collect @ ./array.jl:759 [inlined] [7] macro expansion @ ~/.julia/packages/Arrow/5pHqZ/src/table.jl:526 [inlined] [8] (::Arrow.var"#102#108"{Bool, Channel{Any}, ConcurrentUtilities.OrderedSynchronizer, Dict{Int64, Arrow.DictEncoding}, Arrow.Batch, Int64})() @ Arrow ~/.julia/packages/ConcurrentUtilities/QOkoO/src/ConcurrentUtilities.jl:48 Stacktrace: [1] sync_end(c::Channel{Any}) @ Base ./task.jl:448 [2] macro expansion @ ./task.jl:480 [inlined] [3] Arrow.Table(blobs::Vector{Arrow.ArrowBlob}; convert::Bool) @ Arrow ~/.julia/packages/Arrow/5pHqZ/src/table.jl:441 [4] Table @ ~/.julia/packages/Arrow/5pHqZ/src/table.jl:415 [inlined] [5] Table @ ~/.julia/packages/Arrow/5pHqZ/src/table.jl:407 [inlined] [6] Arrow.Table(input::String) @ Arrow ~/.julia/packages/Arrow/5pHqZ/src/table.jl:407 [7] top-level scope @ REPL[4]:1 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [Python] `pa.Table.from_pylist` support list of tuples? [arrow]
alanhdu opened a new issue, #43435: URL: https://github.com/apache/arrow/issues/43435 ### Describe the enhancement requested I have a function that returns an iterator-of-tuples and would like to turn that into pyarrow table. I have the column names separately, would like to use the PyArrow's type inference for the actual types. I can sort of get what I want with something like: ```python import pandas as pd pa.Table.from_pandas( pd.DataFrame.from_records(tuples, columns=column_names) ) ``` But this doesn't quite work, since Pandas will cast nullable integers to floats. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [Java] Do not publish protobuf based on filename [arrow]
danepitkin opened a new issue, #43437: URL: https://github.com/apache/arrow/issues/43437 ### Describe the enhancement requested After migrating to Java 11, we see this warning: ``` [WARNING] * Required filename-based automodules detected: [protobuf-java-3.25.1.jar, protobuf-java-util-3.25.1.jar]. Please don't publish this project to a public artifact repository! * ``` We should upgrade to a version of protobuf that includes the `Automatic-Module-Name`, which prevents downstream projects from experiencing dependency version conflicts. ### Component(s) Java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [C++][FlightRPC] Flight UCX build is failing [arrow]
felipecrv closed issue #43429: [C++][FlightRPC] Flight UCX build is failing URL: https://github.com/apache/arrow/issues/43429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [C++] vendored abseil fails to build with gcc-13 [arrow]
assignUser closed issue #43228: [C++] vendored abseil fails to build with gcc-13 URL: https://github.com/apache/arrow/issues/43228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Unable to filter a factor column in a Dataset using `%in%` [arrow]
spencerpease opened a new issue, #43440: URL: https://github.com/apache/arrow/issues/43440 ### Describe the bug, including details regarding any error messages, version, and platform. Hello, Is it possible filter a factor using `%in%` in an Arrow Dataset? I naively expected that if I save an arrow IPC file with a factor column, I would then be able to filter that column using `%in%` when the file is loaded as an Arrow Dataset. Instead, I get the error `Type error: Array type doesn't match type of values set: string vs dictionary`. Arrow seems aware of factors though, since I can filter that same column using `==` or `!=` and collecting the dataset without filtering returns a factor. I was able to recreate this error on both Windows and Linux, please see the attached reprex for details. Thank you in advance for your help! ``` r # Create a simple data.frame and save as an arrow IPC file temp_file <- tempfile() d1 <- data.frame(x = factor(c("a", "b", "c"))) arrow::write_feather(d1, temp_file) # Filtering using == (or !=) works d2 <- arrow::open_dataset(temp_file, format = "arrow") |> dplyr::filter(x == "a") |> dplyr::collect() # Filtering using %in% does not work (for single or multiple values) d3 <- arrow::open_dataset(temp_file, format = "arrow") |> dplyr::filter(x %in% "a") |> dplyr::collect() #> Error in `compute.arrow_dplyr_query()`: #> ! Type error: Array type doesn't match type of values set: string vs dictionary # Collecting the dataset before filtering also works and returns a factor d4 <- arrow::open_dataset(temp_file, format = "arrow") |> dplyr::collect() |> dplyr::filter(x %in% c("a")) is.factor(d4$x) #> [1] TRUE ``` Created on 2024-07-26 with [reprex v2.1.1](https://reprex.tidyverse.org) Session info ``` r sessionInfo() #> R version 4.4.1 (2024-06-14 ucrt) #> Platform: x86_64-w64-mingw32/x64 #> Running under: Windows 11 x64 (build 22631) #> #> Matrix products: default #> #> #> locale: #> [1] LC_COLLATE=English_United States.utf8 #> [2] LC_CTYPE=English_United States.utf8 #> [3] LC_MONETARY=English_United States.utf8 #> [4] LC_NUMERIC=C #> [5] LC_TIME=English_United States.utf8 #> #> time zone: America/Los_Angeles #> tzcode source: internal #> #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> #> loaded via a namespace (and not attached): #> [1] vctrs_0.6.5 cli_3.6.3 knitr_1.48rlang_1.1.4 #> [5] xfun_0.45 purrr_1.0.2 generics_0.1.3 assertthat_0.2.1 #> [9] glue_1.7.0bit_4.0.5 htmltools_0.5.8.1 fansi_1.0.6 #> [13] rmarkdown_2.27tibble_3.2.1 evaluate_0.24.0 tzdb_0.4.0 #> [17] fastmap_1.2.0 yaml_2.3.9lifecycle_1.0.4 compiler_4.4.1 #> [21] dplyr_1.1.4 fs_1.6.4 pkgconfig_2.0.3 rstudioapi_0.16.0 #> [25] digest_0.6.36 R6_2.5.1 utf8_1.2.4reprex_2.1.1 #> [29] tidyselect_1.2.1 pillar_1.9.0 magrittr_2.0.3tools_4.4.1 #> [33] withr_3.0.0 bit64_4.0.5 arrow_16.1.0 ``` ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [C++] concatenate nested namespace in c++17 style with clang-tidy [arrow]
IndifferentArea closed issue #43421: [C++] concatenate nested namespace in c++17 style with clang-tidy URL: https://github.com/apache/arrow/issues/43421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Mixing RLE_DICTIONARY and other column encodings in pyarrow parquet [arrow]
bkief opened a new issue, #43442: URL: https://github.com/apache/arrow/issues/43442 ### Describe the bug, including details regarding any error messages, version, and platform. The ValueError at https://github.com/apache/arrow/blob/aaeff72dd9cb4658913fde3d176416be9a93ebe0/python/pyarrow/_parquet.pyx#L1360-L1367 will raise anytime any column is custom encoded with a dictionary method. This makes it impossible to mix a dictionary encoded column with something like `DELTA_BINARY_PACKED`. I understand this is to prevent duplication of `use_dictionary`. Would it be okay to move this ValueError to the calling function instead? Does anything at the C++ level prevent this? https://github.com/apache/arrow/blob/aaeff72dd9cb4658913fde3d176416be9a93ebe0/python/pyarrow/_parquet.pyx#L1971-L1972 ### Component(s) Parquet, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] The flight.NewRecordWriter parameter is ambiguous [arrow]
mac-zhenfang opened a new issue, #43443: URL: https://github.com/apache/arrow/issues/43443 ### Describe the usage question you have. Please include as many useful details as possible. The func NewRecordWriter(w DataStreamWriter, opts ...ipc.Option) *Writer API indicates the DataStreamWriter is a required parameter, all the others are optional. But in the func (w *Writer) start() error { w.started = true w.mapper.ImportSchema(w.schema) w.lastWrittenDicts = make(map[int64]arrow.Array) // write out schema payloads ps := payloadFromSchema(w.schema, w.mem, &w.mapper) defer ps.Release() for _, data := range ps { err := w.pw.WritePayload(data) if err != nil { return err } } return nil } The w.schema looks a required parameter. If it s nil, will report arrow/ipc: unknown error while writing: runtime error: invalid memory address or nil pointer dereference error. The request is to use the Schema in RecordBatch, instead of a input option ### Component(s) Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [C++] Benchmark Arrow BinaryViewBuilder [arrow]
mapleFU opened a new issue, #43444: URL: https://github.com/apache/arrow/issues/43444 ### Describe the enhancement requested BinaryViewBuilder need benchmark ### Component(s) Benchmarking, C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Partitioned variable does not read in as the correct type [arrow]
thisisnic closed issue #43303: Partitioned variable does not read in as the correct type URL: https://github.com/apache/arrow/issues/43303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Python] Could not read encrypted metadata via pq.read_table [arrow]
heyuqi1970 closed issue #43406: [Python] Could not read encrypted metadata via pq.read_table URL: https://github.com/apache/arrow/issues/43406 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [R] String columns read lazily from readr error when transferred to an arrow table [arrow]
jonkeane closed issue #43349: [R] String columns read lazily from readr error when transferred to an arrow table URL: https://github.com/apache/arrow/issues/43349 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [C++][grpc] 0-length buffers sent to grpc can cause indefinite hangs on MacOS/iOS [arrow]
ziglerari opened a new issue, #43447: URL: https://github.com/apache/arrow/issues/43447 ### Describe the bug, including details regarding any error messages, version, and platform. A [gRPC issue](https://github.com/grpc/grpc/pull/37255) has been identified where transmitting buffers of zero length leads to a persistent hang on MacOS/iOS platforms. Such zero-length buffers may arise, for example, in the context of using the Array structure alongside a validity bitmap, as outlined in the [Arrow Spec](https://arrow.apache.org/docs/format/Columnar.html#validity-bitmaps). Essentially, when every element within an Array is valid (i.e., not null), it's possible to represent this state with a null validity bitmap, indicating that all elements are valid. This scenario is realized through the use of a null buffer, as demonstrated here: https://github.com/apache/arrow/blob/187197c369058f7d1377c1b161c469a9e4542caf/cpp/src/arrow/ipc/writer.cc#L165-L179 The relevant sections of code from both transport mechanisms are provided below for reference: https://github.com/apache/arrow/blob/187197c369058f7d1377c1b161c469a9e4542caf/cpp/src/arrow/flight/transport/grpc/serialization_internal.cc#L283-L287 https://github.com/apache/arrow/blob/187197c369058f7d1377c1b161c469a9e4542caf/cpp/src/arrow/flight/transport/ucx/ucx_internal.cc#L590-L591 Upon examining the UCX transport's approach, it's evident that a precaution is taken to avoid sending zero-length buffers. This strategy appears prudent, as it eliminates the need to forward non-transmittable buffers to the transport layer, potentially offering a solution to the issue observed with gRPC. ### Component(s) C++, FlightRPC -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [CI] Conan-* crosbow jobs need credentials to docker hub (or to not attempt to push there [arrow]
jonkeane opened a new issue, #43449: URL: https://github.com/apache/arrow/issues/43449 ### Describe the bug, including details regarding any error messages, version, and platform. Both `conan-minimal` and `conan-maximal` have been failing for a long time (minimal for 116 days, the maximal one has no successful runs that are reported on the [crossbow report](http://crossbow.voltrondata.com). They both seem to fail with: ``` The push refers to repository [docker.io/conanio/gcc10] 3b2ed178cc9f: Preparing 5f70bf18a086: Preparing a7c9350b994b: Preparing 28da0445c449: Preparing 28da0445c449: Layer already exists a7c9350b994b: Layer already exists 3b2ed178cc9f: Layer already exists 5f70bf18a086: Layer already exists errors: denied: requested access to the resource is denied unauthorized: authentication required Traceback (most recent call last): File "/home/runner/work/crossbow/crossbow/arrow/dev/archery/archery/docker/core.py", line 224, in _execute_docker result = Docker().run(*args, **kwargs) ^ File "/home/runner/work/crossbow/crossbow/arrow/dev/archery/archery/utils/command.py", line 78, in run return subprocess.run(invocation, **kwargs) File "/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['docker', 'push', 'conanio/gcc10:1.62.0']' returned non-zero exit status 1. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/hostedtoolcache/Python/3.12.4/x64/bin/archery", line 8, in sys.exit(archery()) ^ File "/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) ^^ File "/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^ File "/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^ File "/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^ File "/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) ^^^ File "/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/click/decorators.py", line 45, in new_func return f(get_current_context().obj, *args, **kwargs) ^ File "/home/runner/work/crossbow/crossbow/arrow/dev/archery/archery/docker/cli.py", line 259, in docker_compose_push compose.push(image, user=user, *** File "/home/runner/work/crossbow/crossbow/arrow/dev/archery/archery/docker/core.py", line 447, in push _push(service) File "/home/runner/work/crossbow/crossbow/arrow/dev/archery/archery/docker/core.py", line 430, in _push return self._execute_docker('push', service['image']) ^^ File "/home/runner/work/crossbow/crossbow/arrow/dev/archery/archery/docker/core.py", line 227, in _execute_docker raise RuntimeError( RuntimeError: docker push conanio/gcc10:1.62.0 exited with non-zero exit code 1 ``` [recent log](https://github.com/ursacomputing/crossbow/actions/runs/10122212172/job/27994040083). @kou It looks like you were working on those most recently β do you have the credentials we need? Or know if we can disable the uploading? Or possibly remove those tasks from Crossbow? ### Component(s) Continuous Integration -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [R] RStudio crash [arrow]
jonkeane closed issue #43241: [R] RStudio crash URL: https://github.com/apache/arrow/issues/43241 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [Format] Add Opaque canonical extension type [arrow]
lidavidm opened a new issue, #43453: URL: https://github.com/apache/arrow/issues/43453 ### Describe the enhancement requested As proposed on the mailing list: https://lists.apache.org/thread/4pykofrzvkl7dwsnzys8rwnq2owfnt43 ### Component(s) Format -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [C++][Python] Add Opaque canonical extension type [arrow]
lidavidm opened a new issue, #43454: URL: https://github.com/apache/arrow/issues/43454 ### Describe the enhancement requested Part of #43453. ### Component(s) C++, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [Go] Add Opaque canonical extension type [arrow]
lidavidm opened a new issue, #43455: URL: https://github.com/apache/arrow/issues/43455 ### Describe the enhancement requested Part of https://github.com/apache/arrow/issues/43453 ### Component(s) Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [Java] Add Opaque canonical extension type [arrow]
lidavidm opened a new issue, #43456: URL: https://github.com/apache/arrow/issues/43456 ### Describe the enhancement requested Part of https://github.com/apache/arrow/issues/43453 ### Component(s) Java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] The Go flightsql driver doesn't handle scanning LargeString or LargeBinary types [arrow]
phillipleblanc closed issue #43403: The Go flightsql driver doesn't handle scanning LargeString or LargeBinary types URL: https://github.com/apache/arrow/issues/43403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Java] Do not publish protobuf based on filename [arrow]
lidavidm closed issue #43437: [Java] Do not publish protobuf based on filename URL: https://github.com/apache/arrow/issues/43437 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Java][Packaging] java-jars failing on maven module [arrow]
lidavidm closed issue #43432: [Java][Packaging] java-jars failing on maven module URL: https://github.com/apache/arrow/issues/43432 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Java] Upgrade JNI version [arrow]
lidavidm closed issue #43425: [Java] Upgrade JNI version URL: https://github.com/apache/arrow/issues/43425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [C++][grpc] Sending 0-length buffers to gRPC can result in indefinite hangs on MacOS/iOS platforms [arrow]
lidavidm closed issue #43447: [C++][grpc] Sending 0-length buffers to gRPC can result in indefinite hangs on MacOS/iOS platforms URL: https://github.com/apache/arrow/issues/43447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [C++][Gandiva] Attribute mismatch error with unity build [arrow]
kou opened a new issue, #43463: URL: https://github.com/apache/arrow/issues/43463 ### Describe the bug, including details regarding any error messages, version, and platform. ```diff [678/708] Building CXX object src/gandiva/precompiled/CMakeFiles/gandiva-precompiled-test.dir/Unity/unity_0_cxx.cxx.o FAILED: src/gandiva/precompiled/CMakeFiles/gandiva-precompiled-test.dir/Unity/unity_0_cxx.cxx.o /opt/homebrew/bin/ccache /Library/Developer/CommandLineTools/usr/bin/c++ -DARROW_EXTRA_ERROR_CONTEXT -DARROW_HAVE_NEON -DARROW_STATIC -DARROW_WITH_TIMING_TESTS -DGANDIVA_STATIC -DGANDIVA_UNIT_TEST=1 -I/Users/kou/work/cpp/arrow/cpp.build/src -I/Users/kou/work/cpp/arrow/cpp/src -I/Users/kou/work/cpp/arrow/cpp/src/generated -isystem /Users/kou/work/cpp/arrow/cpp/thirdparty/flatbuffers/include -isystem /Users/kou/work/cpp/arrow/cpp.build/_deps/googletest-src/googletest/include -isystem /Users/kou/work/cpp/arrow/cpp.build/_deps/googletest-src/googletest -isystem /Users/kou/work/cpp/arrow/cpp.build/_deps/googletest-src/googlemock/include -isystem /Users/kou/work/cpp/arrow/cpp.build/_deps/googletest-src/googlemock -isystem /opt/homebrew/include -fno-aligned-new -Qunused-arguments -fcolor-diagnostics -Wall -Wextra -Wdocumentation -DARROW_WARN_DOCUMENTATION -Wshorten-64-to-32 -Wno-missing-braces -Wno-unused-parameter -Wno-constant-logical-operand -Wno-return-stack-address -Wdate-time -Wn o-unknown-warning-option -Wno-pass-failed -march=armv8-a -g -Werror -O0 -ggdb -std=c++17 -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.0.sdk -fPIE -MD -MT src/gandiva/precompiled/CMakeFiles/gandiva-precompiled-test.dir/Unity/unity_0_cxx.cxx.o -MF src/gandiva/precompiled/CMakeFiles/gandiva-precompiled-test.dir/Unity/unity_0_cxx.cxx.o.d -o src/gandiva/precompiled/CMakeFiles/gandiva-precompiled-test.dir/Unity/unity_0_cxx.cxx.o -c /Users/kou/work/cpp/arrow/cpp.build/src/gandiva/precompiled/CMakeFiles/gandiva-precompiled-test.dir/Unity/unity_0_cxx.cxx In file included from /Users/kou/work/cpp/arrow/cpp.build/src/gandiva/precompiled/CMakeFiles/gandiva-precompiled-test.dir/Unity/unity_0_cxx.cxx:7: In file included from /Users/kou/work/cpp/arrow/cpp/src/gandiva/precompiled/bitmap_test.cc:19: In file included from /Users/kou/work/cpp/arrow/cpp/src/gandiva/precompiled/types.h:22: /Users/kou/work/cpp/arrow/cpp/src/gandiva/gdv_function_stubs.h:77:1: error: attribute declaration must precede definition [-Werror,-Wignored-attributes] GANDIVA_EXPORT ^ /Users/kou/work/cpp/arrow/cpp/src/gandiva/visibility.h:39:39: note: expanded from macro 'GANDIVA_EXPORT' #define GANDIVA_EXPORT __attribute__((visibility("default"))) ^ /Users/kou/work/cpp/arrow/cpp/src/gandiva/context_helper.cc:63:6: note: previous definition is here void gdv_fn_context_set_error_msg(int64_t context_ptr, char const* err_msg) { ^ In file included from /Users/kou/work/cpp/arrow/cpp.build/src/gandiva/precompiled/CMakeFiles/gandiva-precompiled-test.dir/Unity/unity_0_cxx.cxx:7: In file included from /Users/kou/work/cpp/arrow/cpp/src/gandiva/precompiled/bitmap_test.cc:19: In file included from /Users/kou/work/cpp/arrow/cpp/src/gandiva/precompiled/types.h:22: /Users/kou/work/cpp/arrow/cpp/src/gandiva/gdv_function_stubs.h:80:1: error: attribute declaration must precede definition [-Werror,-Wignored-attributes] GANDIVA_EXPORT ^ /Users/kou/work/cpp/arrow/cpp/src/gandiva/visibility.h:39:39: note: expanded from macro 'GANDIVA_EXPORT' #define GANDIVA_EXPORT __attribute__((visibility("default"))) ^ /Users/kou/work/cpp/arrow/cpp/src/gandiva/context_helper.cc:68:10: note: previous definition is here uint8_t* gdv_fn_context_arena_malloc(int64_t context_ptr, int32_t size) { ^ 2 errors generated. ``` `src/gandiva/precompiled/CMakeFiles/gandiva-precompiled-test.dir/Unity/unity_0_cxx.cxx`: ```cpp /* generated by CMake */ /* NOLINTNEXTLINE(bugprone-suspicious-include,misc-include-cleaner) */ #include "/Users/kou/work/cpp/arrow/cpp/src/gandiva/context_helper.cc" /* NOLINTNEXTLINE(bugprone-suspicious-include,misc-include-cleaner) */ #include "/Users/kou/work/cpp/arrow/cpp/src/gandiva/precompiled/bitmap_test.cc" /* NOLINTNEXTLINE(bugprone-suspicious-include,misc-include-cleaner) */ #include "/Users/kou/work/cpp/arrow/cpp/src/gandiva/precompiled/bitmap.cc" /* NOLINTNEXTLINE(bugprone-suspicious-include,misc-include-cleaner) */ #include "/Users/kou/work/cpp/arrow/cpp/src/gandiva/precompiled/epoch_time_point_test.cc" /* NOLINTNEXTLINE(bugprone-suspicious-include,misc-include-cleaner) */ #include "/Users/kou/work/cpp/arrow/cpp/src/gandiva/precompiled/time_test.cc" /* NOLINTNEXTLINE(bugprone-suspicious-include,misc-include-cleaner) */ #include "/Users/kou/work/cpp/arrow/cpp/src/gandiva/pr
[I] [Packaging][C++] Fail to build bundled ORC with the official LZ4 CMake package on Debian GNU/Linux trixie [arrow]
kou opened a new issue, #43467: URL: https://github.com/apache/arrow/issues/43467 ### Describe the bug, including details regarding any error messages, version, and platform. Recently, Debian GNU/Linux trixie provides LZ4 CMake package based on the official CMake build system. Our LZ4 detection code doesn't work with it: https://github.com/ursacomputing/crossbow/actions/runs/10130290903/job/28011414872#step:8:3691 ```text -- Building Apache ORC from source CMake Error at cmake_modules/ThirdpartyToolchain.cmake:4512 (get_target_property): get_target_property() called with non-existent target "LZ4::lz4". Call Stack (most recent call first): cmake_modules/ThirdpartyToolchain.cmake:208 (build_orc) cmake_modules/ThirdpartyToolchain.cmake:304 (build_dependency) cmake_modules/ThirdpartyToolchain.cmake:4698 (resolve_dependency) CMakeLists.txt:544 (include) ``` Because `LZ4::lz4` isn't provided with LZ4 1.9.4. ### Component(s) C++, Packaging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [CI]: Temporarily turn off conda jobs that are failing [arrow]
raulcd closed issue #43450: [CI]: Temporarily turn off conda jobs that are failing URL: https://github.com/apache/arrow/issues/43450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Change the default CompressionCodec.Factory to leverage compression support transparently [arrow]
ccciudatu opened a new issue, #43469: URL: https://github.com/apache/arrow/issues/43469 ### Describe the enhancement requested Application code is currently required to choose upfront between handling compressed vs. uncompressed data by specifying one of the two (mutually exclusive) `CompressionCodec.Factory` implementations: `NoCompressionCodec.Factory` and `CommonsCompressionCodecFactory`. While this is totally acceptable (or even required) for the write path (e.g. `ArrowWriter`) it makes it really tedious to support compression on the read path, as it's not reasonable to choose between handling _uncompressed-data-only_ and _compressed-data-only_ when writing (e.g.) a client app for Arrow Flight. As already reported in https://github.com/apache/arrow/issues/41457, the Java FlightClient currently fails with the following error when trying to decode a compressed stream: ``` java.lang.IllegalArgumentException: Please add arrow-compression module to use CommonsCompressionCodecFactory for LZ4_FRAME at org.apache.arrow.vector.compression.NoCompressionCodec$Factory.createCodec(NoCompressionCodec.java:63) at org.apache.arrow.vector.compression.CompressionCodec$Factory$1.createCodec(CompressionCodec.java:91) at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:79) at org.apache.arrow.flight.FlightStream.next(FlightStream.java:275) ``` The `FlightStream` class does not explicitly pass a compression codec factory when creating a `VectorLoader`, which then uses the default `NoCompressionCodec.Factory`. Changing the default to `CommonsCompressionCodecFactory` is not an option because: 1. `CommonsCompressionCodecFactory` does not support uncompressed data 2. `arrow-compression` is not a dependency for `arrow-vector` Instead of challenging these two design decisions, the proposed solution (upcoming PR) is to make the default `CompressionCodec.Factory` use a `ServiceLoader` to gather all the available implementations and combine them to support as many `CodecType`s as possible, falling back to the `NO_COMPRESSION` codec type (i.e. the same default as today). The arrow-compression module would then act as a service provider, so that whenever it's present in the module- (or class-) path, it will transparently fill in the gaps of the default factory. As a side note, this is in fact the literal meaning of the above error message (_"Please add arrow-compression module to use CommonsCompressionCodecFactory"_), so we can assume this was the original intention. ### Component(s) FlightRPC, Java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] go/adbc/driver/flightsql: Default Value (10 MB) For adbc.snowflake.rpc.ingest_target_file_size Not Used In 1.1.0 [arrow-adbc]
zeroshade closed issue #1997: go/adbc/driver/flightsql: Default Value (10 MB) For adbc.snowflake.rpc.ingest_target_file_size Not Used In 1.1.0 URL: https://github.com/apache/arrow-adbc/issues/1997 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Restrict direct access to `sun.misc.Unsafe` [arrow]
laurentgo opened a new issue, #43479: URL: https://github.com/apache/arrow/issues/43479 ### Describe the enhancement requested `sun.misc.Unsafe` is a Java internal class only accessible to classes loaded of the boot classloader, unless one uses reflection to bypass this restriction. `org.apache.arrow.memory.util.MemoryUtil` makes it available as a public field to any java classes which is kind of opening a pandora box. As a first step towards switching from `sun.misc.Unsafe` to safer memory access methods (which may become a requirement at some point as discussed in [JEP 471](https://openjdk.org/jeps/471) ), remove direct access to `sun.misc.Unsafe` instance. ### Component(s) Java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [CI] Conan-* crosbow jobs need credentials to docker hub (or to not attempt to push there [arrow]
assignUser closed issue #43449: [CI] Conan-* crosbow jobs need credentials to docker hub (or to not attempt to push there URL: https://github.com/apache/arrow/issues/43449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [C++] Test linkage error when googletest 1.15.0 is installed system wide despite bundling [arrow]
assignUser closed issue #43400: [C++] Test linkage error when googletest 1.15.0 is installed system wide despite bundling URL: https://github.com/apache/arrow/issues/43400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Java] Java Dataset API ScanOptions expansion [arrow]
lidavidm closed issue #28866: [Java] Java Dataset API ScanOptions expansion URL: https://github.com/apache/arrow/issues/28866 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] c: fix include paths for adbc.h [arrow-adbc]
lidavidm closed issue #1150: c: fix include paths for adbc.h URL: https://github.com/apache/arrow-adbc/issues/1150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [JAVA] [C++] Support more CsvFragmentScanOptions in JNI call [arrow]
jinchengchenghh opened a new issue, #43483: URL: https://github.com/apache/arrow/issues/43483 ### Describe the enhancement requested Support most of the options mapping to cpp code struct ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Python] Expose methods to get the device and memory_manager on the pyarrow.cuda.Context class [arrow]
jorisvandenbossche closed issue #43391: [Python] Expose methods to get the device and memory_manager on the pyarrow.cuda.Context class URL: https://github.com/apache/arrow/issues/43391 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] SEGFAULT in test_udf_via_substrait when run in CPython debug build [arrow]
lysnikolaou opened a new issue, #43487: URL: https://github.com/apache/arrow/issues/43487 ### Describe the bug, including details regarding any error messages, version, and platform. Hey everyone! π I'm trying to build arrow and PyArrow with a debug build of CPython and run the test suite, but I keep running into a segmentation fault. The crash happens [in `test_udf_via_substrait` here](https://github.com/apache/arrow/blob/main/python/pyarrow/tests/test_substrait.py#L440), and I can reproduce it using both 3.13.0b4+ and 3.12.4. The source of the segmentation fault is the `Py_INCREF` that's happening [here](https://github.com/apache/arrow/blob/main/python/pyarrow/src/arrow/python/udf.cc#L48). Because this is under a debug build, `Py_INCREF` tries to access the thread state to increase the aggregate reference count. Stripped stack trace to `Py_INCREF` call ``` libpython3.13td.dylib!reftotal_add (/Users/user/.pyenv/sources/3.13t-dev-debug/Python-3.13-dev/Objects/object.c:84) libpython3.13td.dylib!_Py_INCREF_IncRefTotal (/Users/user/.pyenv/sources/3.13t-dev-debug/Python-3.13-dev/Objects/object.c:231) libarrow_python.dylib!Py_INCREF(_object*) (/Users/user/.pyenv/versions/3.13t-dev-debug/include/python3.13td/object.h:835) libarrow_python.dylib!arrow::py::(anonymous namespace)::PythonUdfKernelState::PythonUdfKernelState(std::__1::shared_ptr) (/Users/user/repos/python/arrow/python/pyarrow/src/arrow/python/udf.cc:48) libarrow_python.dylib!arrow::py::(anonymous namespace)::PythonUdfKernelState::PythonUdfKernelState(std::__1::shared_ptr) (/Users/user/repos/python/arrow/python/pyarrow/src/arrow/python/udf.cc:47) libarrow_python.dylib!std::__1::__unique_if::__unique_single std::__1::make_unique[abi:ue170006]&>(std::__1::shared_ptr&) (/Library/Developer/CommandLineTools/SDKs/MacOSX14.4.sdk/usr/include/c++/v1/__memory/unique_ptr.h:689) libarrow_python.dylib!arrow::py::(anonymous namespace)::PythonUdfKernelInit::operator()(arrow::compute::KernelContext*, arrow::compute::KernelInitArgs const&) (/Users/user/repos/python/arrow/python/pyarrow/src/arrow/python/udf.cc:78) libarrow_python.dylib!decltype(std::declval()(std::declval(), std::declval())) std::__1::__invoke[abi:ue170006](arrow::py::(anonymous namespace)::PythonUdfKernelInit&, arrow::compute::KernelContext*&&, arrow::compute::KernelInitArgs const&) (/Library/Developer/CommandLineTools/SDKs/MacOSX14.4.sdk/usr/include/c++/v1/__type_traits/invoke.h:340) libarrow_python.dylib!arrow::Result>> std::__1::__invoke_void_return_wrapper>>, false>::__call[abi:ue170006](arrow::py::(anonymous namespace)::PythonUdfKernelInit&, arrow::compute::KernelContext*&&, arrow::compute::KernelInitArgs const&) (/Library/Developer/CommandLineTools/SDKs/MacOSX14.4.sdk/usr/include/c++/v1/__type_traits/invoke.h:407) libarrow_python.dylib!std::__1::__function::__alloc_func, arrow::Result>> (arrow::compute::KernelContext*, arrow::compute::KernelInitArgs const&)>::operator()[abi:ue170006](arrow::compute::KernelContext*&&, arrow::compute::KernelInitArgs const&) (/Library/Developer/CommandLineTools/SDKs/MacOSX14.4.sdk/usr/include/c++/v1/__functional/function.h:193) libarrow_python.dylib!std::__1::__function::__func, arrow::Result>> (arrow::compute::KernelContext*, arrow::compute::KernelInitArgs const&)>::operator()(arrow::compute::KernelContext*&&, arrow::compute::KernelInitArgs const&) (/Library/Developer/CommandLineTools/SDKs/MacOSX14.4.sdk/usr/include/c++/v1/__functional/function.h:364) libarrow.1800.0.0.dylib!std::__1::__function::__value_func>> (arrow::compute::KernelContext*, arrow::compute::KernelInitArgs const&)>::operator()[abi:ue170006](arrow::compute::KernelContext*&&, arrow::compute::KernelInitArgs const&) const (/Library/Developer/CommandLineTools/SDKs/MacOSX14.4.sdk/usr/include/c++/v1/__functional/function.h:518) libarrow.1800.0.0.dylib!std::__1::function>> (arrow::compute::KernelContext*, arrow::compute::KernelInitArgs const&)>::operator()(arrow::compute::KernelContext*, arrow::compute::KernelInitArgs const&) const (/Library/Developer/CommandLineTools/SDKs/MacOSX14.4.sdk/usr/include/c++/v1/__functional/function.h:1169) libarrow.1800.0.0.dylib!arrow::compute::(anonymous namespace)::BindNonRecursive(arrow::compute::Expression::Call, bool, arrow::compute::ExecContext*)::$_21::operator()() const (/Users/user/repos/python/arrow/cpp/src/arrow/compute/expression.cc:544) libarrow.1800.0.0.dylib!arrow::compute::(anonymous namespace)::BindNonRecursive(arrow::compute::Expression::Call, bool, arrow::compute::ExecContext*) (/Users/user/repos/python/arrow/cpp/src/arrow/compute/expression.cc:560) libarrow.1800.0.0.dylib!arrow::Result arrow::compute::(anonymous namespace)::BindImpl(arrow::compute::Expression, arrow::Schema const&, arrow::compute::ExecContext*) (/Users/user/repos/python/arrow/cpp/src/arrow/compute/expression.cc:628) libarrow.1800.0
Re: [I] Snowflake adbc_ingest reverting back to CSV uploading [arrow-adbc]
davlee1972 closed issue #2041: Snowflake adbc_ingest reverting back to CSV uploading URL: https://github.com/apache/arrow-adbc/issues/2041 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Performance questions: What is the best way to upsert and in general? (Postgres) [arrow-adbc]
avm19 opened a new issue, #2046: URL: https://github.com/apache/arrow-adbc/issues/2046 ### What would you like help with? - Why is `executemany()` much slower than `adbc_ingest()`? - What is the best way and most performant way to pass data with a complex query/operation? - Is there anything I am doing wrong? -- I want to insert and update records in a table using Python API of adbc_driver_postgres, let's say, I have 10k rows: ```python import pyarrow as pa a = pa.array(range(1)) table = pa.Table.from_arrays(arrays=[a,a,a], names=['col1', 'col2', 'col3']) ``` I noticed that `executemany()` is much slower than `adbc_ingest()` for ingesting data. Let's say I have 10k rows: ```python # 0.1 s with adbc_driver_postgresql.dbapi.connect(uri) as conn: with conn.cursor() as cursor: cursor.execute("TRUNCATE TABLE test_table;") cursor.adbc_ingest('test_table', table, mode="replace") cursor.execute('ALTER TABLE test_table ADD PRIMARY KEY ("col1");') conn.commit() ``` as compared to ```python # ~7.5 sec with adbc_driver_postgresql.dbapi.connect(uri) as conn: with conn.cursor() as cursor: cursor.execute("TRUNCATE TABLE test_table;") query = 'INSERT INTO test_table ("col1", "col2", "col3") VALUES ($1, $2, $3);' cursor.executemany(query, table) conn.commit() ``` I don't mind using `adbc_ingest()` to populate my database, but later in its lifecycle I need to upsert records and more. For example, I need to do something like: ```python # ~7.5 sec query = ( 'INSERT INTO test_table ("col1", "col2", "col3") VALUES ($1, $2, $3)' 'ON CONFLICT ("col1") DO UPDATE SET "col2" = EXCLUDED."col2", "col3" = 0;' ) with adbc_driver_postgresql.dbapi.connect(uri) as conn: with conn.cursor() as cursor: cursor.executemany(query, table) ``` which is too slow. Apparently `executemany()` is extremely inefficient for this ask. What is the cause of such a poor performance? What is the bottleneck? The same outcome could be achieved much faster by first ingesting data into a temporary table and then making Postgres run a more complex operation from it rather than from input: ```python # 0.2s with adbc_driver_postgresql.dbapi.connect(uri) as conn: with conn.cursor() as cursor: cursor.adbc_ingest('test_table2', table, mode="replace") query = ( 'INSERT INTO test_table ("col1", "col2", "col3")\n' 'SELECT "col1", "col2", "col3" FROM test_table2\n' 'ON CONFLICT ("col1") DO UPDATE SET "col2" = EXCLUDED."col2", "col3" = 0;' ) cursor.execute(query) conn.commit() ``` This approach gives a reasonable performance, but is this how one supposed to do this? Is there anything that can be easily improved? I do not know much about Postgres's backend operation and what optimisations it does for ingestion, but I suspect that it is not best practice to create temporary tables (which are not even TEMPORARY) when we just want to stream data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Restrict direct access to `sun.misc.Unsafe` [arrow]
danepitkin closed issue #43479: Restrict direct access to `sun.misc.Unsafe` URL: https://github.com/apache/arrow/issues/43479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Change the default CompressionCodec.Factory to leverage compression support transparently [arrow]
danepitkin closed issue #43469: Change the default CompressionCodec.Factory to leverage compression support transparently URL: https://github.com/apache/arrow/issues/43469 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [Java] Add example using CompressionCodec [arrow-cookbook]
danepitkin opened a new issue, #354: URL: https://github.com/apache/arrow-cookbook/issues/354 The CompressionCodec now uses a ServiceLoader to load all available options. The default is NoCompressionCodec. See https://github.com/apache/arrow/pull/43471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Packaging][C++] Fail to build bundled ORC with the official LZ4 CMake package on Debian GNU/Linux trixie [arrow]
assignUser closed issue #43467: [Packaging][C++] Fail to build bundled ORC with the official LZ4 CMake package on Debian GNU/Linux trixie URL: https://github.com/apache/arrow/issues/43467 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [C++][Java]When i use DatasetFileWriter::write method write a file, I can't specify a file name [arrow]
shouriken opened a new issue, #43489: URL: https://github.com/apache/arrow/issues/43489 ### Describe the usage question you have. Please include as many useful details as possible. I use this static method to write a parquet file into fs, I give empty partition array, so it will be in a file; and i give the `baseNameTemplate` arg is "test.parquet" to specify the filename, but it leads to an error: `basename_template did not contain '{i}'`. ```java public static void write(BufferAllocator allocator, ArrowReader reader, FileFormat format, String uri, String[] partitionColumns, int maxPartitions, String baseNameTemplate) { try (final ArrowArrayStream stream = ArrowArrayStream.allocateNew(allocator)) { Data.exportArrayStream(allocator, reader, stream); JniWrapper.get().writeFromScannerToFile(stream.memoryAddress(), format.id(), uri, partitionColumns, maxPartitions, baseNameTemplate); } } ``` read the cpp jni code and cpp dataset's code, the FileSystemDatasetWriteOptions::basename_template seems not supported to specify the file name without `{i}`. ```cpp /// \brief Options for writing a dataset. struct ARROW_DS_EXPORT FileSystemDatasetWriteOptions { . . . /// Template string used to generate fragment basenames. /// {i} will be replaced by an auto incremented integer. std::string basename_template; . . . } ``` If I have any method to specify the filename when none partition? ### Component(s) Java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Use Amazon KMS for encryption having error: "OSError: Incorrect key to columns mapping in column keys property:" [arrow]
channingdata closed issue #43426: Use Amazon KMS for encryption having error: "OSError: Incorrect key to columns mapping in column keys property:" URL: https://github.com/apache/arrow/issues/43426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [Java] `new RootAllocator()` occasionally report error `java.lang.NoSuchFieldError: chunkSize` [arrow]
xinyiZzz opened a new issue, #43491: URL: https://github.com/apache/arrow/issues/43491 ### Describe the bug, including details regarding any error messages, version, and platform. Hi, `new RootAllocator()` occasionally reports an error, I looked through other issues and guess that the versions of `Netty` and `Arrow` are incompatible, could this be the reason for the error? Arrow version: 15.0.2 Netty version: 4.1.104.Final I am trying to upgrade Arrow 17.0.0 to see if it can solve the problem. Thanks. ``` java.lang.NoSuchFieldError: chunkSize at io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.(PooledByteBufAllocatorL.java:153) ~[lakesoul-io-java-2.6.1-shaded.jar:4.1.104.Final] at io.netty.buffer.PooledByteBufAllocatorL.(PooledByteBufAllocatorL.java:49) ~[lakesoul-io-java-2.6.1-shaded.jar:4.1.104.Final] at org.apache.arrow.memory.NettyAllocationManager.(NettyAllocationManager.java:51) ~[arrow-memory-netty-15.0.2.jar:15.0.2] at org.apache.arrow.memory.DefaultAllocationManagerFactory.(DefaultAllocationManagerFactory.java:26) ~[arrow-memory-netty-15.0.2.jar:15.0.2] at java.lang.Class.forName0(Native Method) ~[?:?] at java.lang.Class.forName(Class.java:375) ~[?:?] at org.apache.arrow.memory.DefaultAllocationManagerOption.getFactory(DefaultAllocationManagerOption.java:108) ~[arrow-memory-core-15.0.2.jar:15.0.2] at org.apache.arrow.memory.DefaultAllocationManagerOption.getDefaultAllocationManagerFactory(DefaultAllocationManagerOption.java:98) ~[arrow-memory-core-15.0.2.jar:15.0.2] at org.apache.arrow.memory.BaseAllocator$Config.getAllocationManagerFactory(BaseAllocator.java:773) ~[arrow-memory-core-15.0.2.jar:15.0.2] at org.apache.arrow.memory.ImmutableConfig.access$801(ImmutableConfig.java:24) ~[arrow-memory-core-15.0.2.jar:15.0.2] at org.apache.arrow.memory.ImmutableConfig$InitShim.getAllocationManagerFactory(ImmutableConfig.java:83) ~[arrow-memory-core-15.0.2.jar:15.0.2] at org.apache.arrow.memory.ImmutableConfig.(ImmutableConfig.java:47) ~[arrow-memory-core-15.0.2.jar:15.0.2] at org.apache.arrow.memory.ImmutableConfig.(ImmutableConfig.java:24) ~[arrow-memory-core-15.0.2.jar:15.0.2] at org.apache.arrow.memory.ImmutableConfig$Builder.build(ImmutableConfig.java:485) ~[arrow-memory-core-15.0.2.jar:15.0.2] at org.apache.arrow.memory.BaseAllocator.(BaseAllocator.java:62) ~[arrow-memory-core-15.0.2.jar:15.0.2] at org.apache.doris.service.arrowflight.DorisFlightSqlService.(DorisFlightSqlService.java:47) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.qe.QeService.start(QeService.java:68) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.DorisFE.start(DorisFE.java:213) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.DorisFE.main(DorisFE.java:95) ~[doris-fe.jar:1.2-SNAPSHOT] ``` ### Component(s) Java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [C++] Thirdparty: bump lz4 to 1.10.0 [arrow]
mapleFU opened a new issue, #43492: URL: https://github.com/apache/arrow/issues/43492 ### Describe the enhancement requested Seems it has performance improment: https://github.com/lz4/lz4/releases/tag/v1.10.0 ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Inline parent validity bitmap into child validity bitmap [arrow]
takaaki7 opened a new issue, #43494: URL: https://github.com/apache/arrow/issues/43494 ### Describe the enhancement requested Currently, when struct itself is null, child's validity is unknown. So client must compute AND of those bitmaps to know child's validity. I think inlining parent validity into child validity is more effective for query engines. ### Component(s) Format -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Java] `new RootAllocator()` occasionally report error `java.lang.NoSuchFieldError: chunkSize` [arrow]
vibhatha closed issue #43491: [Java] `new RootAllocator()` occasionally report error `java.lang.NoSuchFieldError: chunkSize` URL: https://github.com/apache/arrow/issues/43491 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Dataset with Filename partitioning sometimes loses files on write [arrow]
rafal-c opened a new issue, #43496: URL: https://github.com/apache/arrow/issues/43496 ### Describe the bug, including details regarding any error messages, version, and platform. Consider a simple program (code below) which creates a Table, turns it into a Dataset and writes the Dataset with Filename partitioning to a directory `/tmp/dataset`. Let's call it `myprogram`. Now if you run `myprogram` and look into /tmp/dataset repeatedly, this is what you may see: ```bash β ./myprogram && ls /tmp/dataset 2019_part0.parquet 2020_part0.parquet 2021_part0.parquet 2022_part0.parquet β ./myprogram && ls /tmp/dataset 2019_part0.parquet 2020_part0.parquet 2021_part0.parquet 2022_part0.parquet β ./myprogram && ls /tmp/dataset 2019_part0.parquet 2020_part0.parquet 2021_part0.parquet 2022_part0.parquet β ./myprogram && ls /tmp/dataset 2020_part0.parquet 2021_part0.parquet 2022_part0.parquet β ./myprogram && ls /tmp/dataset 2019_part0.parquet 2020_part0.parquet 2021_part0.parquet 2022_part0.parquet β ./myprogram && ls /tmp/dataset 2019_part0.parquet 2020_part0.parquet 2021_part0.parquet 2022_part0.parquet β ./myprogram && ls /tmp/dataset 2020_part0.parquet 2021_part0.parquet 2022_part0.parquet β ./myprogram && ls /tmp/dataset 2019_part0.parquet 2020_part0.parquet 2021_part0.parquet 2022_part0.parquet β ./myprogram && ls /tmp/dataset 2019_part0.parquet 2021_part0.parquet 2022_part0.parquet β ./myprogram && ls /tmp/dataset 2020_part0.parquet 2021_part0.parquet 2022_part0.parquet ``` So for some reason it randomly skips parts of the dataset on write. This is not specific to Parquet and it happens on all major platforms (Linux/Windows/MacOS) with Arrow 16.0.0. Here is the full code to reproduce: ```cpp #include #include #include arrow::Result> makeTable() { using arrow::field; auto schema = arrow::schema({field("a", arrow::int64()), field("year", arrow::int64())}); std::vector> arrays(2); arrow::NumericBuilder builder; ARROW_RETURN_NOT_OK(builder.AppendValues({5, 2, 4, 100, 2, 4})); ARROW_RETURN_NOT_OK(builder.Finish(&arrays[0])); builder.Reset(); ARROW_RETURN_NOT_OK(builder.AppendValues({2019, 2020, 2021, 2021, 2022, 2022})); ARROW_RETURN_NOT_OK(builder.Finish(&arrays[1])); return arrow::Table::Make(schema, arrays); } int main() { namespace ds = arrow::dataset; // Create an Arrow Table auto table = makeTable().ValueOrDie(); auto dataset = std::make_shared(table); auto scanner_builder = dataset->NewScan().ValueOrDie(); auto scanner = scanner_builder->Finish().ValueOrDie(); // The partition schema determines which fields are part of the partitioning. auto partition_schema = arrow::schema({arrow::field("year", arrow::int64())}); auto partitioning = std::make_shared(partition_schema); // We'll write Parquet files. auto format = std::make_shared(); ds::FileSystemDatasetWriteOptions write_options; write_options.file_write_options = format->DefaultWriteOptions(); write_options.existing_data_behavior = ds::ExistingDataBehavior::kDeleteMatchingPartitions; write_options.filesystem = std::make_shared();; write_options.base_dir = "/tmp/dataset"; write_options.partitioning = partitioning; write_options.basename_template = "part{i}.parquet"; return ds::FileSystemDataset::Write(write_options, scanner) != arrow::Status::OK(); } ``` ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Python][ppc64le][cuda] pytest segfault - test_cuda.py/test_foreign_buffer [arrow]
jorisvandenbossche closed issue #31432: [Python][ppc64le][cuda] pytest segfault - test_cuda.py/test_foreign_buffer URL: https://github.com/apache/arrow/issues/31432 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Python] StructArray.from_array() should accept a type (in addition to names or fields) [arrow]
AlenkaF closed issue #42014: [Python] StructArray.from_array() should accept a type (in addition to names or fields) URL: https://github.com/apache/arrow/issues/42014 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Reading partial data/first block hangs on some cloud filesystems [arrow]
dberenbaum opened a new issue, #43497: URL: https://github.com/apache/arrow/issues/43497 ### Describe the bug, including details regarding any error messages, version, and platform. Take the following example using a publicly available dataset: ```python import gcsfs from pyarrow.dataset import dataset # without fsspec filesystem, get segmentation fault fs = None # with fsspec filesystem, hangs and never finishes # fs = gcsfs.GCSFileSystem() uri = "gs://datachain-demo/laion-aesthetics-csv/laion_aesthetics_1024_33M_1.csv" ds = dataset(uri, format="csv", filesystem=fs) print(ds.head(5)) ``` As noted in the comments, depending on which filesystem is passed, it will either hang indefinitely or hit a segmentation fault. Strangely, s3 paths work (don't hang or fail) with the pyarrow filesystem but hang with the fsspec s3fs filesystem. Other findings: - Similar operations like `ds.take()` and `next(ds.to_batches())` have the same behavior as `ds.head()` - `ds.head(use_threads=False)` completes successfully with any filesystem but takes much longer ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [C++] Benchmark Arrow BinaryViewBuilder [arrow]
felipecrv closed issue #43444: [C++] Benchmark Arrow BinaryViewBuilder URL: https://github.com/apache/arrow/issues/43444 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] The flight.NewRecordWriter parameter is ambiguous [arrow]
joellubi closed issue #43443: The flight.NewRecordWriter parameter is ambiguous URL: https://github.com/apache/arrow/issues/43443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] The flight.NewRecordWriter parameter is ambiguous [arrow]
joellubi closed issue #43443: The flight.NewRecordWriter parameter is ambiguous URL: https://github.com/apache/arrow/issues/43443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [CI] Crossbow report shouldn't include jobs that aren't configured / run recently [arrow]
jonkeane opened a new issue, #43499: URL: https://github.com/apache/arrow/issues/43499 ### Describe the enhancement requested In #43451 we turned off a bunch of jobs that were constantly red, but I was surprised to see that they are still [showing up on the crossbow nightly report](http://crossbow.voltrondata.com) would it be possible to update that report so that jobs that haven't been run in the last 2-3 days aren't in the list of builds to show up? For example for `conda-win-x64-cuda-py3` the last run was three days ago, but it's still listed in the table + looks like it ran with the most recent group and failed. cc @boshek ### Component(s) Continuous Integration -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [R][CI] Bump dev docs CI job from ubuntu 20.04 [arrow]
jonkeane opened a new issue, #43500: URL: https://github.com/apache/arrow/issues/43500 ### Describe the enhancement requested Ubuntu 20.04 is quite old, let's use something more modern ### Component(s) Continuous Integration, R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [JAVA] Fix Java JNI / AMD64 manylinux2014 Java JNI test not test dataset module [arrow]
jinchengchenghh opened a new issue, #43502: URL: https://github.com/apache/arrow/issues/43502 ### Describe the bug, including details regarding any error messages, version, and platform. https://github.com/apache/arrow/pull/41646#issuecomment-2259855172 ### Component(s) Java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [JS] bump `command-line-usage` for security [arrow]
bombard1004 opened a new issue, #43505: URL: https://github.com/apache/arrow/issues/43505 ### Describe the enhancement requested Npm package `apache-arrow` depends on `command-line-usage`. A [security vulnerability](https://github.com/advisories/GHSA-28mc-g557-92m7) was discovered in one of the dependencies of `command-line-usage`, and a patch has been released. However, `apache-arrow` has strictly specified the version of `command-line-usage`, which prevents this security patch from being applied. Therefore, the version of `command-line-usage` needs to be updated. ### Component(s) JavaScript -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org