colin-r-schultz opened a new issue, #45371: URL: https://github.com/apache/arrow/issues/45371
### Describe the bug, including details regarding any error messages, version, and platform. The following example test case, when run under TSAN, reports a data race. ```cpp TEST_F(TestRecordBatch, ColumnsThreadSafety) { const int length = 10; random::RandomArrayGenerator gen(42); std::shared_ptr<ArrayData> array_data = gen.ArrayOf(utf8(), length)->data(); auto schema = ::arrow::schema({field("f1", utf8())}); auto record_batch = RecordBatch::Make(schema, length, {array_data}); std::atomic_bool start_flag{false}; std::thread t([record_batch, &start_flag]() { start_flag.store(true); auto columns = record_batch->columns(); ASSERT_EQ(columns.size(), 1); }); // Wait for thread startup while (!start_flag.load()) { }; auto columns = record_batch->columns(); ASSERT_EQ(columns.size(), 1); t.join(); } ``` The relevant definitions in `record_batch.cc` are below ```cpp const std::vector<std::shared_ptr<Array>>& columns() const override { for (int i = 0; i < num_columns(); ++i) { // Force all columns to be boxed column(i); } return boxed_columns_; } std::shared_ptr<Array> column(int i) const override { std::shared_ptr<Array> result = std::atomic_load(&boxed_columns_[i]); if (!result) { result = MakeArray(columns_[i]); std::atomic_store(&boxed_columns_[i], result); } return result; } ``` The `columns()` method returns a reference to `mutable boxed_columns_`, assuming that it is fully initialized and will not be written to again. However, multiple threads can race to initialize `boxed_columns_[i]`, leading to additional atomic writes after `column(i)` has been called for the first time. These atomic writes can race against non atomic reads of the `boxed_columns_` vector after it is returned by `columns()`. This is undefined behavior and can lead to a use-after-free of the contained `Array`s. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org