apmorton opened a new issue, #46214:
URL: https://github.com/apache/arrow/issues/46214

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   When constructing a S3FileSystem object with a region explicitly specified 
arrow will still end up causing aws metadata lookup operations, unless 
explicitly (and globally) disabled with the `AWS_EC2_METADATA_DISABLED ` 
environment variable.
   
   This goes against the documented (and I believe intended?) behavior of the 
`region` kwarg:
   ```
   AWS region to connect to. If not set, the AWS SDK will attempt to determine 
the region using heuristics such as environment variables, configuration 
profile, EC2 metadata, or default to ‘us-east-1’ when SDK version <1.8.
   ```
   
   ```python
   pyarrow.fs.S3FileSystem(
       access_key='key',
       secret_key='secret',
       endpoint_override='https://my.appliance.uri',
       region='region',
   )
   ```
   
   using py-spy I can observe the following stack:
   ```
       Aws::Http::CurlHttpClient::MakeRequest (libaws-cpp-sdk-core.so)
       
Aws::Internal::AWSHttpResourceClient::GetResourceWithAWSWebServiceResult[abi:cxx11]
 (libaws-cpp-sdk-core.so)
       Aws::Internal::EC2MetadataClient::GetCurrentRegion[abi:cxx11] 
(libaws-cpp-sdk-core.so)
       Aws::Client::ClientConfiguration::ClientConfiguration 
(libaws-cpp-sdk-core.so)
       Aws::S3::S3ClientConfiguration::S3ClientConfiguration 
(libaws-cpp-sdk-s3.so)
       
__gnu_cxx::new_allocator<arrow::fs::S3FileSystem::Impl>::construct<arrow::fs::S3FileSystem::Impl,
 arrow::fs::S3Options const&, arrow::io::IOContext const&> 
(libarrow.so.1801.0.0)
       arrow::fs::S3FileSystem::S3FileSystem (libarrow.so.1801.0.0)
       arrow::fs::S3FileSystem::Make (libarrow.so.1801.0.0)
       S3FileSystem___init__ (pyarrow/_s3fs.cpython-311-x86_64-linux-gnu.so)
   ```
   
   This is caused by default construction of `S3ClientConfiguration` in 
`ClientBuilder`.
   On our machines (which aren't in aws and have no idms running) this takes 6+ 
seconds.
   
   A workaround is something as follows:
   ```cpp
   #ifdef ARROW_S3_HAS_S3CLIENT_CONFIGURATION
     Aws::S3::S3ClientConfiguration 
client_config_{Aws::Client::ClientConfigurationInitValues{false}};
   #else
     Aws::Client::ClientConfiguration 
client_config_{Aws::Client::ClientConfigurationInitValues{false}};
   #endif
   ```
   which disables idms during configuration construction.
   
   Some additional work would be required to add back in IDMS lookup of region 
when otherwise not specified.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to