adamreeve opened a new issue, #44852: URL: https://github.com/apache/arrow/issues/44852
### Describe the bug, including details regarding any error messages, version, and platform. @pitrou pointed out that `InternalFileDecryptor` reusing the `footer_data_decryptor_` could be problematic for multi-threaded Parquet reads: https://github.com/apache/arrow/issues/43057#issuecomment-2497334650 I confirmed that this does lead to decryptor errors when scanning a Dataset with Parquet files that use uniform encryption by modifying the existing Parquet Dataset encryption tests: ```diff diff --git a/cpp/src/arrow/dataset/file_parquet_encryption_test.cc b/cpp/src/arrow/dataset/file_parquet_encryption_test.cc index 0287d593d1..6a13b1ee37 100644 --- a/cpp/src/arrow/dataset/file_parquet_encryption_test.cc +++ b/cpp/src/arrow/dataset/file_parquet_encryption_test.cc @@ -90,7 +90,7 @@ class DatasetEncryptionTestBase : public ::testing::Test { auto encryption_config = std::make_shared<parquet::encryption::EncryptionConfiguration>( std::string(kFooterKeyName)); - encryption_config->column_keys = kColumnKeyMapping; + encryption_config->uniform_encryption = true; auto parquet_encryption_config = std::make_shared<ParquetEncryptionConfig>(); // Directly assign shared_ptr objects to ParquetEncryptionConfig members parquet_encryption_config->crypto_factory = crypto_factory_; ``` This causes `DatasetEncryptionTest::WriteReadDatasetWithEncryption` to fail with an error like: ``` /home/adam/dev/arrow/cpp/src/arrow/dataset/file_parquet_encryption_test.cc:159: Failure Failed '_error_or_value28.status()' failed with IOError: AesDecryptor was wiped outDeserializing page header failed. /home/adam/dev/arrow/cpp/src/parquet/arrow/reader.cc:109 LoadBatch(batch_size) /home/adam/dev/arrow/cpp/src/parquet/arrow/reader.cc:1263 ReadColumn(static_cast<int>(i), row_groups, reader.get(), &column) /home/adam/dev/arrow/cpp/src/arrow/util/parallel.h:95 func(i, inputs[i]) /home/adam/dev/arrow/cpp/src/arrow/dataset/file_parquet_encryption_test.cc:208: Failure Expected: TestScanDataset() doesn't generate new fatal failures in the current thread. Actual: it does. ``` For `LargeRowEncryptionTest::ReadEncryptLargeRows`, I sometimes get the same `AesDecryptor was wiped out` error, but also see errors like: ``` /home/adam/dev/arrow/cpp/src/arrow/dataset/file_parquet_encryption_test.cc:159: Failure Failed '_error_or_value28.status()' failed with IOError: Failed decryption finalization /home/adam/dev/arrow/cpp/src/parquet/arrow/reader.cc:109 LoadBatch(batch_size) /home/adam/dev/arrow/cpp/src/parquet/arrow/reader.cc:1263 ReadColumn(static_cast<int>(i), row_groups, reader.get(), &column) /home/adam/dev/arrow/cpp/src/arrow/util/parallel.h:95 func(i, inputs[i]) /home/adam/dev/arrow/cpp/src/arrow/dataset/file_parquet_encryption_test.cc:265: Failure Expected: TestScanDataset() doesn't generate new fatal failures in the current thread. Actual: it does. ``` I don't think it's possible to reproduce this from PyArrow only, as the `uniform_encryption` setting isn't exposed in PyArrow. ### Component(s) C++, Parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org