[I] `S3FileSystem` slow to deserialize due to AWS rule engine JSON parsing [arrow]

via GitHub Wed, 28 Feb 2024 08:35:30 -0800


fjetter opened a new issue, #40279:
URL: https://github.com/apache/arrow/issues/40279


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Deserializing a pickled S3FileSystem instance is surprisingly slow
   
   
   ```python
   
   import boto3
   from pyarrow.fs import S3FileSystem
   
   # Going via boto is not strictly necessary but setting all the keys and 
tokens already avoids one HTTP request during init
   session = boto3.session.Session()
   credentials = session.get_credentials()
   
   fs = S3FileSystem(
       secret_key=credentials.secret_key,
       access_key=credentials.access_key,
       region="us-east-2",
       session_token=credentials.token,
   )
   # Note: This can also be seen by using just S3FileSystem() but this then 
posts one HTTP request and I want to emphasize the slow json parser, see below
   ```
   
   ```python
   %timeit pickle.loads(pickle.dumps(fs))
   ```
   takes `1.01 ms ± 153 µs per loop` on my machine
   
   Looking at a py-spy profile shows that most of the time is spent in some 
internal JSON parsing. Is there a way to avoid this?
   
   
![image](https://github.com/apache/arrow/assets/8629629/0c9895a6-6550-4d2d-8936-a1bf193dadb3)
   
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] `S3FileSystem` slow to deserialize due to AWS rule engine JSON parsing [arrow]

Reply via email to