manjum-a11y opened a new issue, #14942: URL: https://github.com/apache/iceberg/issues/14942
### Apache Iceberg version 1.7.2 ### Query engine Spark ### Please describe the bug đ When reading a large Iceberg table from S3 using S3FileIO with S3 Access Grants enabled, Spark jobs intermittently fail with a NullPointerException inside the AWS SDK v2 AttributeMap$Builder.resolveValue, called from S3AccessGrantsIdentityProvider.resolveIdentity. This only appears under high concurrency / large datasets (e.g., spark.read.table(...).count() over many files). Smaller tables or lower parallelism may run successfully, but increasing parallelism makes the failure reproducible. The error message from the AWS SDK is: Encountered a null value when resolving configuration attributes. This is commonly caused by concurrent modifications to non-thread-safe types. Ensure you're synchronizing access to all non-thread-safe types. From the Iceberg side we are using S3FileIO with S3 Access Grants configured according to the docs, and the S3 client is built via S3Client.builder() with S3FileIOProperties.applyS3AccessGrantsConfigurations(...) (or equivalent). We have already tried these below combos where still the NPE issue persist **Iceberg versions** 1.7.2 and upgraded to 1.10.0 â NPE persists in both. **AWS SDK v2 versions** Tried 2.24.6, 2.30.31, 2.32.1â NPE persists across all. **S3 Access Grants plugin versions** Tried 2.0.2 and 2.3.0 â NPE persists across both. **Spark / JDK combinations** Spark 3.5.6 with JDK17 and Spark 4.0.1 (JDK21 inside image) â same NPE in both. **Parallelism tuning** - Reduced spark.sql.shuffle.partitions / spark.default.parallelism â can change frequency but does not reliably remove the NPE on large tables. Could you please help me to understand the issue: **1. Known issue?** Are you aware of any known concurrency problems between Icebergâs S3FileIO S3 Access Grants integration and AWS SDK v2 / aws-s3-accessgrants-java-plugin that could cause AttributeMap$Builder.resolveValue to throw an NPE under high Spark parallelism? **2. Recommended version matrix?** Is there a recommended or validated combination of: Iceberg version AWS SDK v2 version aws-s3-accessgrants-java-plugin version for running S3 Access Grants with S3FileIO in a highâconcurrency Spark environment? **3. Client factory / configuration guidance?** From Icebergâs side, is there any specific guidance on how the S3 client factory should be implemented (or additional S3FileIO / S3AG configuration) to avoid shared, nonâthreadâsafe state that might trigger this NPE? ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
