adamreeve opened a new issue, #44852:
URL: https://github.com/apache/arrow/issues/44852

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   @pitrou pointed out that `InternalFileDecryptor` reusing the 
`footer_data_decryptor_` could be problematic for multi-threaded Parquet reads: 
https://github.com/apache/arrow/issues/43057#issuecomment-2497334650
   
   I confirmed that this does lead to decryptor errors when scanning a Dataset 
with Parquet files that use uniform encryption by modifying the existing 
Parquet Dataset encryption tests:
   
   ```diff
   diff --git a/cpp/src/arrow/dataset/file_parquet_encryption_test.cc 
b/cpp/src/arrow/dataset/file_parquet_encryption_test.cc
   index 0287d593d1..6a13b1ee37 100644
   --- a/cpp/src/arrow/dataset/file_parquet_encryption_test.cc
   +++ b/cpp/src/arrow/dataset/file_parquet_encryption_test.cc
   @@ -90,7 +90,7 @@ class DatasetEncryptionTestBase : public ::testing::Test {
        auto encryption_config =
            std::make_shared<parquet::encryption::EncryptionConfiguration>(
                std::string(kFooterKeyName));
   -    encryption_config->column_keys = kColumnKeyMapping;
   +    encryption_config->uniform_encryption = true;
        auto parquet_encryption_config = 
std::make_shared<ParquetEncryptionConfig>();
        // Directly assign shared_ptr objects to ParquetEncryptionConfig members
        parquet_encryption_config->crypto_factory = crypto_factory_;
   
   ```
   
   This causes `DatasetEncryptionTest::WriteReadDatasetWithEncryption` to fail 
with an error like:
   ```
   
/home/adam/dev/arrow/cpp/src/arrow/dataset/file_parquet_encryption_test.cc:159: 
Failure
   Failed
   '_error_or_value28.status()' failed with IOError: AesDecryptor was wiped 
outDeserializing page header failed.
   
   /home/adam/dev/arrow/cpp/src/parquet/arrow/reader.cc:109  
LoadBatch(batch_size)
   /home/adam/dev/arrow/cpp/src/parquet/arrow/reader.cc:1263  
ReadColumn(static_cast<int>(i), row_groups, reader.get(), &column)
   /home/adam/dev/arrow/cpp/src/arrow/util/parallel.h:95  func(i, inputs[i])
   
/home/adam/dev/arrow/cpp/src/arrow/dataset/file_parquet_encryption_test.cc:208: 
Failure
   Expected: TestScanDataset() doesn't generate new fatal failures in the 
current thread.
     Actual: it does.
   ```
   
   For `LargeRowEncryptionTest::ReadEncryptLargeRows`, I sometimes get the same 
`AesDecryptor was wiped out` error, but also see errors like:
   ```
   
/home/adam/dev/arrow/cpp/src/arrow/dataset/file_parquet_encryption_test.cc:159: 
Failure
   Failed
   '_error_or_value28.status()' failed with IOError: Failed decryption 
finalization
   /home/adam/dev/arrow/cpp/src/parquet/arrow/reader.cc:109  
LoadBatch(batch_size)
   /home/adam/dev/arrow/cpp/src/parquet/arrow/reader.cc:1263  
ReadColumn(static_cast<int>(i), row_groups, reader.get(), &column)
   /home/adam/dev/arrow/cpp/src/arrow/util/parallel.h:95  func(i, inputs[i])
   
/home/adam/dev/arrow/cpp/src/arrow/dataset/file_parquet_encryption_test.cc:265: 
Failure
   Expected: TestScanDataset() doesn't generate new fatal failures in the 
current thread.
     Actual: it does.
   ```
   
   I don't think it's possible to reproduce this from PyArrow only, as the 
`uniform_encryption` setting isn't exposed in PyArrow.
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to