anthonysgro opened a new issue, #11116: URL: https://github.com/apache/iceberg/issues/11116
### Apache Iceberg version 1.5.0 ### Query engine Spark ### Please describe the bug 🐞 Receiving this stack trace when reading from cross-account iceberg glue table: ``` diagnostics: User class threw exception: java.lang.NoClassDefFoundError: software/amazon/awssdk/http/urlconnection/UrlConnectionHttpClient at org.apache.iceberg.aws.AwsClientFactories.configureHttpClientBuilder(AwsClientFactories.java:160) at org.apache.iceberg.aws.AwsClientFactories$DefaultAwsClientFactory.glue(AwsClientFactories.java:111) at org.apache.iceberg.aws.glue.GlueCatalog.initialize(GlueCatalog.java:141) at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:200) at org.apache.iceberg.CatalogUtil.buildIcebergCatalog(CatalogUtil.java:237) at org.apache.iceberg.spark.SparkCatalog.buildIcebergCatalog(SparkCatalog.java:119) at org.apache.iceberg.spark.SparkCatalog.initialize(SparkCatalog.java:411) at org.apache.spark.sql.connector.catalog.Catalogs$.load(Catalogs.scala:65) at org.apache.spark.sql.connector.catalog.CatalogManager.$anonfun$catalog$1(CatalogManager.scala:53) at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86) at org.apache.spark.sql.connector.catalog.CatalogManager.catalog(CatalogManager.scala:53) at org.apache.spark.sql.connector.catalog.LookupCatalog$CatalogAndIdentifier$.unapply(LookupCatalog.scala:122) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$resolveRelation$1(Analyzer.scala:1314) at scala.Option.orElse(Option.scala:447) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$resolveRelation(Analyzer.scala:1313) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$14.applyOrElse(Analyzer.scala:1163) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$14.applyOrElse(Analyzer.scala:1124) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:33) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1124) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1083) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:239) at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) at scala.collection.immutable.List.foldLeft(List.scala:91) at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:236) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$6(RuleExecutor.scala:319) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.catalyst.rules.RuleExecutor$RuleExecutionContext$.withContext(RuleExecutor.scala:368) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5(RuleExecutor.scala:319) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5$adapted(RuleExecutor.scala:309) at scala.collection.immutable.List.foreach(List.scala:431) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:309) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:195) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:191) at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:260) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:256) at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:190) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:256) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:219) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:182) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108) at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:182) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:243) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:242) at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:80) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:219) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:256) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:625) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:256) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:255) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:77) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:69) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:93) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:91) at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:608) at com.amazon.baileyaggregation.spark.SparkIcebergTableDao.readAsDataFrame(SparkIcebergTableDao.scala:12) at com.amazon.baileyaggregation.spark.Iceberg.AyclRoyaltyJournalsIcebergDao.readDataFrame(AyclRoyaltyJournalsIcebergDao.scala:18) at com.amazon.baileyaggregation.spark.readers.AyclRoyaltyJournalsReader.read(AyclRoyaltyJournalsReader.scala:24) at com.amazon.baileyaggregation.spark.readers.AyclRoyaltyJournalsReader.readJournals(AyclRoyaltyJournalsReader.scala:57) at com.amazon.baileyaggregation.reports.ExceptionReportsDatasetGenerator$.generateAyclEvents$1(ExceptionReportsDatasetGenerator.scala:78) at com.amazon.baileyaggregation.reports.ExceptionReportsDatasetGenerator$.generateExceptionReportsDataset(ExceptionReportsDatasetGenerator.scala:60) at com.amazon.baileyaggregation.reports.ExceptionReportsDatasetGenerator$.main(ExceptionReportsDatasetGenerator.scala:41) at com.amazon.baileyaggregation.reports.ExceptionReportsDatasetGenerator.main(ExceptionReportsDatasetGenerator.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:741) Caused by: java.lang.ClassNotFoundException: software.amazon.awssdk.http.urlconnection.UrlConnectionHttpClient at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) ... 78 more ApplicationMaster host: ip-10-0-1-95.ec2.internal ApplicationMaster RPC port: 40951 queue: default start time: 1726089925586 final status: FAILED tracking URL: http://ip-10-0-1-254.ec2.internal:20888/proxy/application_1726089801574_0001/ user: hadoop ``` My SparkSession configuration: ``` def crossAccountAyclIcebergSession(appName: String, domain: String): SparkSession = { SparkSession .builder .config(new SparkConf() .setAppName(appName) .setMaster("yarn") .set("spark.driver.extraJavaOptions", "-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35") .set("spark.executor.extraJavaOptions", "-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35") .set("spark.sql.hive.metastore.jars", "maven") .set("hive.metastore.client.factory.class", "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory") ) .config(s"spark.sql.catalog.${CATALOG}.http-client.type", "apache") .config(s"spark.sql.catalog.${CATALOG}", "org.apache.iceberg.spark.SparkCatalog") .config(s"spark.sql.catalog.${CATALOG}.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog") .config(s"spark.sql.catalog.${CATALOG}.io-impl", "org.apache.iceberg.aws.s3.S3FileIO") .config(s"spark.sql.catalog.${CATALOG}.warehouse", "my-warehouse") .config(s"spark.sql.catalog.${CATALOG}.glue.id", "my-account") .config(s"spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") .config(s"spark.sql.iceberg.vectorization.enabled", "true") .enableHiveSupport() .getOrCreate() } ``` Really not sure why this is happening. The only thing I can think of is that I am using two Spark Sessions (though I am using the one I provided specifically to read from the cross-account table). This works with another entrypoint in my Spark project, so I can only assume there is something weird going on with Iceberg. I saw https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/AwsClientFactories.java#L166 that the only way I can be requesting the URLConnectionHttpClient is if `clientType === HttpClientProperties.CLIENT_TYPE_URLCONNECTION`, but I have explicitly set it to "apache" in my config. Additionally, I tried adding the UrlConnectionHttpClient and just dealing with it, but that leads to other NoMethodFound issues so I am back to square one. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org