picarro-sdivakar opened a new issue, #13691: URL: https://github.com/apache/iceberg/issues/13691
### Apache Iceberg version None ### Query engine None ### Please describe the bug š Iām trying to use the Kafka Connect Iceberg sink connector built from the latest main branch (v1.10.0-SNAPSHOT), following the instructions in the [iceberg-kafka-connect-runtime](https://github.com/apache/iceberg/tree/main/kafka-connect/kafka-connect-runtime) module. After unzipping the built connector JARs into the Kafka Connect plugin path and starting a sink connector with type=hadoop catalog, the connector fails with the following error: `java.lang.NoClassDefFoundError: org/apache/commons/configuration2/Configuration at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:43) ... Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration2.Configuration at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:103) ` and `java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.security.UserGroupInformation at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:557) ... ` Steps to Reproduce 1. Checkout https://github.com/apache/iceberg and build the connector by ./gradlew build -x test -x integrationTest 2. Copied the zip output from build/distributions/iceberg-kafka-connect-runtime-1.10.0-SNAPSHOT.zip to Kafka Connect plugin path and unzipped it. 3. Started the connector by submitting `{ "name": "iceberg-sink", "config": { "connector.class": "org.apache.iceberg.connect.IcebergSinkConnector", "tasks.max": "1", "topics": "company.product.conc.level2", "iceberg.catalog.name": "hadoop", "iceberg.catalog.type": "hadoop", "iceberg.catalog.warehouse": "s3a://domain/product/iceberg/", "iceberg.catalog.hadoop.fs.s3a.endpoint": "<>:443", "iceberg.catalog.hadoop.fs.s3a.access.key": "<>", "iceberg.catalog.hadoop.fs.s3a.secret.key": "<>", "iceberg.catalog.hadoop.fs.s3a.path.style.access": "<>", "iceberg.catalog.hadoop.fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem", "flush.size": "500", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "com.company.cdp.avroconverter.CustomAvroConverter", "value.converter.avro.schema.file": "/etc/kafka-connect/data.avsc", "value.converter.schemas.enable": "true", "iceberg.tables": "db.processed_concentration", "iceberg.table.schema.file": "/etc/kafka-connect/data.avsc" } } ` Analysis It seems that the commons-configuration2 transitive dependency required by org.apache.hadoop.security.UserGroupInformation is not bundled into the connector distribution. Since Kafka Connect uses isolated classloading per plugin, these dependencies must be explicitly included. Suggested Fix Could we consider either: Adding the missing commons-configuration2 dependency (and any other Hadoop transitive dependencies required at runtime) to the runtimeClasspath of iceberg-kafka-connect-runtime, or Publishing a shaded uber-JAR (similar to how some other connectors are bundled), or Updating documentation to guide users to manually include required Hadoop dependencies in the plugin path. Please advise on the preferred direction. Happy to contribute a patch if needed. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
