jpohanka opened a new issue, #7737:
URL: https://github.com/apache/iceberg/issues/7737
### Apache Iceberg version
1.2.1 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
I have tried to use Apache Iceberg 1.2.1 on Spark 3.3.0 (dockerized
deployment, cluster deployment mode) by following the instructions in the
documentation.
However, I have quickly run into several dependency issues discussed bellow.
## Issues with `software.amazon.awssdk:bundle-2.20.18`
The first set of issues that I have encountered were related to the
`software.amazon.awssdk:bundle-2.20.18` package.
**Used packages** in the `spark.jars.packages` configuration variable:
- `org.apache.hadoop:hadoop-aws:3.3.4`
- `software.amazon.awssdk:url-connection-client:2.20.18`
- `software.amazon.awssdk:bundle:2.20.18`
- `org.apache.spark:spark-hive_2.12:3.3.0`
- `org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.2.1`
- `org.apache.iceberg:iceberg-aws:1.2.1`
When using this setup, I kept receiving the following error:
```
An error was encountered:
An error occurred while calling o433.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 6.0 failed 20 times, most recent failure: Lost task 0.19 in stage 6.0
(TID 25) (10.254.0.55 executor 0): java.lang.NoSuchMethodError: 'void
software.amazon.awssdk.utils.IoUtils.closeQuietly(java.lang.AutoCloseable,
software.amazon.awssdk.thirdparty.org.slf4j.Logger)'
at
software.amazon.awssdk.core.util.SdkUserAgent.kotlinVersion(SdkUserAgent.java:182)
at
software.amazon.awssdk.core.util.SdkUserAgent.getAdditionalJvmLanguages(SdkUserAgent.java:132)
at
software.amazon.awssdk.core.util.SdkUserAgent.getUserAgent(SdkUserAgent.java:116)
at
software.amazon.awssdk.core.util.SdkUserAgent.initializeUserAgent(SdkUserAgent.java:95)
at
software.amazon.awssdk.core.util.SdkUserAgent.<init>(SdkUserAgent.java:65)
at
software.amazon.awssdk.core.util.SdkUserAgent.create(SdkUserAgent.java:72)
at
software.amazon.awssdk.core.client.builder.SdkDefaultClientBuilder.lambda$mergeGlobalDefaults$3(SdkDefaultClientBuilder.java:276)
at
software.amazon.awssdk.utils.builder.SdkBuilder.applyMutation(SdkBuilder.java:61)
at
software.amazon.awssdk.core.client.config.SdkClientConfiguration.merge(SdkClientConfiguration.java:66)
at
software.amazon.awssdk.core.client.builder.SdkDefaultClientBuilder.mergeGlobalDefaults(SdkDefaultClientBuilder.java:271)
at
software.amazon.awssdk.core.client.builder.SdkDefaultClientBuilder.syncClientConfiguration(SdkDefaultClientBuilder.java:180)
at
software.amazon.awssdk.services.s3.DefaultS3ClientBuilder.buildClient(DefaultS3ClientBuilder.java:36)
at
software.amazon.awssdk.services.s3.DefaultS3ClientBuilder.buildClient(DefaultS3ClientBuilder.java:25)
at
software.amazon.awssdk.core.client.builder.SdkDefaultClientBuilder.build(SdkDefaultClientBuilder.java:150)
at
org.apache.iceberg.aws.AwsClientFactories$DefaultAwsClientFactory.s3(AwsClientFactories.java:107)
at org.apache.iceberg.aws.s3.S3FileIO.client(S3FileIO.java:326)
at org.apache.iceberg.aws.s3.S3FileIO.newInputFile(S3FileIO.java:124)
at
org.apache.iceberg.spark.source.BaseReader.toEncryptedInputFile(BaseReader.java:197)
```
**Solution** - replace `software.amazon.awssdk:bundle-2.20.18` with the
packages for specific AWS service:
- `org.apache.hadoop:hadoop-aws:3.3.4`
- `software.amazon.awssdk:url-connection-client:2.20.18`
- `software.amazon.awssdk:core:2.20.18`
- `software.amazon.awssdk:glue:2.20.18`
- `software.amazon.awssdk:s3:2.20.18`
- `software.amazon.awssdk:sts:2.20.18`
- `software.amazon.awssdk:utils:2.20.18`
- `org.apache.spark:spark-hive_2.12:3.3.0`
- `org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.2.1`
- `org.apache.iceberg:iceberg-aws:1.2.1`
After using this package setup, I stopped getting the issue with
`software.amazon.awssdk.thirdparty.org.slf4j.Logger`.
## Issues with `org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.2.1`
After fixing the issue with
`software.amazon.awssdk.thirdparty.org.slf4j.Logger`, I have started to receive
the following issue with the `org.apache.iceberg.BaseFile` class:
```
An error was encountered:
An error occurred while calling o450.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 5.0 failed 20 times, most recent failure: Lost task 0.19 in stage 5.0
(TID 65) (10.254.0.80 executor 0): java.io.InvalidClassException:
org.apache.iceberg.BaseFile; local class incompatible: stream classdesc
serialVersionUID = -655543782470255741, local class serialVersionUID =
2686776604825259963
at
java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
at
java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2014)
at
java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1864)
at
java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2014)
at
java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1864)
at
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2195)
at
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1681)
at
java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2490)
```
This issue was more difficult to solve, because it was non-deteministic -
sometimes the Spark jobs worked, and sometimes they have failed.
**Solution** - the solution to this issue was to remove the
`org.apache.iceberg:iceberg-aws:1.2.1` package from the `spark.jars.packages`
configuration variable and use just the following packages:
- `org.apache.hadoop:hadoop-aws:3.3.4`
- `software.amazon.awssdk:url-connection-client:2.20.18`
- `software.amazon.awssdk:core:2.20.18`
- `software.amazon.awssdk:glue:2.20.18`
- `software.amazon.awssdk:s3:2.20.18`
- `software.amazon.awssdk:sts:2.20.18`
- `software.amazon.awssdk:utils:2.20.18`
- `org.apache.spark:spark-hive_2.12:3.3.0`
- `org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.2.1`
After this fix, Iceberg works well with Spark cluster and Glue. It seems
that this was a case of class shading.
## Next steps
- The documentation should be updated to mitigate these issues for newcomers
to to Apache Iceberg.
- The Iceberg packages should be investigated to find the cause of these
issues.
## References
- https://github.com/apache/iceberg/issues/5611
- https://github.com/apache/iceberg/issues/5970
- https://medium.com/@akhaku/java-class-shadowing-and-shading-9439b0eacb13
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]