dotnwat opened a new issue, #44372: URL: https://github.com/apache/arrow/issues/44372
### Describe the bug, including details regarding any error messages, version, and platform. After upgrading to clang 18 we are getting a `misaligned-pointer-use` ubsan error firing when reading parquet table. Tried with both arrow 16 and 17 releases. ``` /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/arrow/util/ubsan.h:66:21: runtime error: load of misaligned address 0x7f0787c2d3c2 for type 'const unsigned int *', which requires 4 byte alignment -- | 0x7f0787c2d3c2: note: pointer points here | a5 bd 06 0b 40 20 0c 44 61 1c 48 a2 2c 4c e3 3c 50 24 4d 54 65 5d 58 a6 6d 5c e7 7d 60 28 8e 24 | ^ | SUMMARY: UndefinedBehaviorSanitizer: misaligned-pointer-use /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/arrow/util/ubsan.h:66:21 ``` ``` | [Backtrace #8] | __sanitizer::Abort() at /llvm/3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff/src/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp:143 | | | [Backtrace #9] | __sanitizer::Die() at /llvm/3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff/src/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:58 | | | [Backtrace #10] | __ubsan::ScopedReport::~ScopedReport() at /llvm/3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff/src/compiler-rt/lib/ubsan/ubsan_diag.cpp:402 | | | [Backtrace #11] | handleTypeMismatchImpl(__ubsan::TypeMismatchData*, unsigned long, __ubsan::ReportOptions) at /llvm/3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff/src/compiler-rt/lib/ubsan/ubsan_handlers.cpp:137 | | | [Backtrace #12] | __ubsan_handle_type_mismatch_v1 at /llvm/3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff/src/compiler-rt/lib/ubsan/ubsan_handlers.cpp:142 | | | [Backtrace #13] | int arrow::internal::unpack32_specialized<arrow::internal::(anonymous namespace)::UnpackBits512<(arrow::internal::DispatchLevel)3> >(unsigned int const*, unsigned int*, int, int) at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/arrow/util/ubsan.h:66 | | | [Backtrace #14] | int arrow::bit_util::BitReader::GetBatch<int>(int, int*, int) at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/arrow/util/bit_stream_utils.h:342 | | | [Backtrace #15] | int arrow::util::RleDecoder::GetBatchWithDict<long>(long const*, int, long*, int) at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/arrow/util/rle_encoding.h:580 | | | [Backtrace #16] | parquet::(anonymous namespace)::DictDecoderImpl<parquet::PhysicalType<(parquet::Type::type)2> >::Decode(long*, int) at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/parquet/encoding.cc:1634 | | | [Backtrace #17] | parquet::internal::(anonymous namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)2> >::ReadValuesDense(long) at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/parquet/column_reader.cc:1824 | | | [Backtrace #18] | parquet::internal::(anonymous namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)2> >::ReadRecordData(long) at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/parquet/column_reader.cc:1879 | | | [Backtrace #19] | parquet::internal::(anonymous namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)2> >::ReadRecords(long) at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/parquet/column_reader.cc:1425 | | | [Backtrace #20] | parquet::arrow::(anonymous namespace)::LeafReader::LoadBatch(long) at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/parquet/arrow/reader.cc:482 | | | [Backtrace #21] | parquet::arrow::ColumnReaderImpl::NextBatch(long, std::__1::shared_ptr<arrow::ChunkedArray>*) at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/parquet/arrow/reader.cc:109 | | | [Backtrace #22] | parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadColumn(int, std::__1::vector<int, std::__1::allocator<int> > const&, parquet::arrow::ColumnReader*, std::__1::shared_ptr<arrow::ChunkedArray>*) at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/parquet/arrow/reader.cc:284 | | | [Backtrace #23] | parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::__1::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::__1::vector<int, std::__1::allocator<int> > const&, std::__1::vector<int, std::__1::allocator<int> > const&, arrow::internal::Executor*)::$_0::operator()(unsigned long, std::__1::shared_ptr<parquet::arrow::ColumnReaderImpl>) const at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/parquet/arrow/reader.cc:1252 | | | [Backtrace #24] | parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::__1::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::__1::vector<int, std::__1::allocator<int> > const&, std::__1::vector<int, std::__1::allocator<int> > const&, arrow::internal::Executor*) at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/arrow/util/parallel.h:95 | | | [Backtrace #25] | parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadRowGroups(std::__1::vector<int, std::__1::allocator<int> > const&, std::__1::vector<int, std::__1::allocator<int> > const&, std::__1::shared_ptr<arrow::Table>*) at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/parquet/arrow/reader.cc:1231 | | | [Backtrace #26] | parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadTable(std::__1::vector<int, std::__1::allocator<int> > const&, std::__1::shared_ptr<arrow::Table>*) at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/parquet/arrow/reader.cc:199 | | | [Backtrace #27] | parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadTable(std::__1::shared_ptr<arrow::Table>*) at /v/build/v_deps_build/arrow-prefix/src/arrow/cpp/src/parquet/arrow/reader.cc:300 ``` ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org