markhoerth commented on issue #10707: URL: https://github.com/apache/gravitino/issues/10707#issuecomment-4232009152
After investigation, the working JAR combination for Gravitino 1.2.0 with standard Apache Spark 3.5.5 is: - `iceberg-spark-runtime-3.5_2.12-1.10.0.jar` - `iceberg-aws-bundle-1.10.0.jar` Both JARs are required and versions must match. The `iceberg-spark-runtime` JAR resolves the original `NoClassDefFoundError: ExtendedDataSourceV2Strategy` error. The `iceberg-aws-bundle` JAR is additionally required for `S3FileIO` — without it, catalog initialization fails with `NoClassDefFoundError: software/amazon/awssdk/services/s3/model/S3Exception` even when the spark-runtime is present. **Why earlier versions failed:** Versions 1.4.3–1.6.x of `iceberg-spark-runtime` either bundled conflicting AWS SDK v1 classes (causing SIGSEGV when combined with `iceberg-aws-bundle`) or had `iceberg-aws-bundle` incompatibilities. At 1.10.0 this conflict is resolved — both JARs coexist cleanly on the classpath. **Verified against:** - Gravitino 1.2.0 Spark connector (`gravitino-spark-connector-runtime-3.5_2.12-1.2.0.jar`) - Apache Spark 3.5.5 (standard distribution, `apache/spark:3.5.5` Docker image) - `enableIcebergSupport=true` - MinIO as S3-compatible storage - Full read/write of a 9.5M row Iceberg table confirmed via both Trino and Spark SQL **Documentation gap still open:** The docs state "download Iceberg Spark runtime jar to Spark classpath" with no version specified and no mention that `iceberg-aws-bundle` is also required. This makes it impossible to reproduce a working setup from the docs alone. Recommend updating to specify: 1. The exact compatible `iceberg-spark-runtime` version 2. That `iceberg-aws-bundle` at the same version is also required for S3/object storage 3. That both JARs must be at the same Iceberg version to avoid classpath conflicts Thanks to @danhuawang for the pointer to 1.10.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
