markhoerth commented on issue #10707:
URL: https://github.com/apache/gravitino/issues/10707#issuecomment-4232009152

   After investigation, the working JAR combination for Gravitino 1.2.0 with 
standard Apache Spark 3.5.5 is:
   
   - `iceberg-spark-runtime-3.5_2.12-1.10.0.jar`
   - `iceberg-aws-bundle-1.10.0.jar`
   
   Both JARs are required and versions must match. The `iceberg-spark-runtime` 
JAR resolves the original `NoClassDefFoundError: ExtendedDataSourceV2Strategy` 
error. The `iceberg-aws-bundle` JAR is additionally required for `S3FileIO` — 
without it, catalog initialization fails with `NoClassDefFoundError: 
software/amazon/awssdk/services/s3/model/S3Exception` even when the 
spark-runtime is present.
   
   **Why earlier versions failed:**
   Versions 1.4.3–1.6.x of `iceberg-spark-runtime` either bundled conflicting 
AWS SDK v1 classes (causing SIGSEGV when combined with `iceberg-aws-bundle`) or 
had `iceberg-aws-bundle` incompatibilities. At 1.10.0 this conflict is resolved 
— both JARs coexist cleanly on the classpath.
   
   **Verified against:**
   - Gravitino 1.2.0 Spark connector 
(`gravitino-spark-connector-runtime-3.5_2.12-1.2.0.jar`)
   - Apache Spark 3.5.5 (standard distribution, `apache/spark:3.5.5` Docker 
image)
   - `enableIcebergSupport=true`
   - MinIO as S3-compatible storage
   - Full read/write of a 9.5M row Iceberg table confirmed via both Trino and 
Spark SQL
   
   **Documentation gap still open:**
   The docs state "download Iceberg Spark runtime jar to Spark classpath" with 
no version specified and no mention that `iceberg-aws-bundle` is also required. 
This makes it impossible to reproduce a working setup from the docs alone. 
Recommend updating to specify:
   1. The exact compatible `iceberg-spark-runtime` version
   2. That `iceberg-aws-bundle` at the same version is also required for 
S3/object storage
   3. That both JARs must be at the same Iceberg version to avoid classpath 
conflicts
   
   Thanks to @danhuawang for the pointer to 1.10.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to