colin-r-schultz opened a new issue, #45371:
URL: https://github.com/apache/arrow/issues/45371

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   The following example test case, when run under TSAN, reports a data race.
   ```cpp
   TEST_F(TestRecordBatch, ColumnsThreadSafety) {
     const int length = 10;
   
     random::RandomArrayGenerator gen(42);
     std::shared_ptr<ArrayData> array_data = gen.ArrayOf(utf8(), 
length)->data();
     auto schema = ::arrow::schema({field("f1", utf8())});
     auto record_batch = RecordBatch::Make(schema, length, {array_data});
     std::atomic_bool start_flag{false};
     std::thread t([record_batch, &start_flag]() {
       start_flag.store(true);
       auto columns = record_batch->columns();
       ASSERT_EQ(columns.size(), 1);
     });
     // Wait for thread startup
     while (!start_flag.load()) {
     };
     auto columns = record_batch->columns();
     ASSERT_EQ(columns.size(), 1);
     t.join();
   }
   ```
   The relevant definitions in `record_batch.cc` are below
   ```cpp
     const std::vector<std::shared_ptr<Array>>& columns() const override {
       for (int i = 0; i < num_columns(); ++i) {
         // Force all columns to be boxed
         column(i);
       }
       return boxed_columns_;
     }
   
     std::shared_ptr<Array> column(int i) const override {
       std::shared_ptr<Array> result = std::atomic_load(&boxed_columns_[i]);
       if (!result) {
         result = MakeArray(columns_[i]);
         std::atomic_store(&boxed_columns_[i], result);
       }
       return result;
     }
   ```
   The `columns()` method returns a reference to `mutable boxed_columns_`, 
assuming that it is fully initialized and will not be written to again. 
However, multiple threads can race to initialize `boxed_columns_[i]`, leading 
to additional atomic writes after `column(i)` has been called for the first 
time. These atomic writes can race against non atomic reads of the 
`boxed_columns_` vector after it is returned by `columns()`. This is undefined 
behavior and can lead to a use-after-free of the contained `Array`s.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to