paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1970719190
@RussellSpitzer thanks a lot for helping with this. Want to give a bit more details (we discussed this with Russell in iceberg slack). This is how I would load my catalog for the RewriteDataFiles action (**Tried this with Nessie and Hive, so, what kind of catalog you use does not matter**): ``` HiveCatalog catalog = new HiveCatalog(); catalog.setConf(spark.sparkContext().hadoopConfiguration()); Map<String, String> properties = new HashMap<>(); properties.put("warehouse", "s3a://obs-zdp-warehouse-stage-mz/"); properties.put("uri", "thrift://******"); catalog.initialize("hive", properties); String tableName = args[0]; String[] schemaAndTable = tableName.split("\\."); TableIdentifier tableId = TableIdentifier.of(schemaAndTable); Table table = catalog.loadTable(tableId); ``` For the rewrite_data_files procedure this is how the catalog is loaded: ``` Table table = Spark3Util.loadIcebergTable(spark, tableNameStr); ``` As you can see in the case of RewriteDataFiles action I would construct the catalog myself from scratch and supply all the settings manually. Whereas, for the rewrite_data_files procedure it would take all its configs from SparkSession settings: ``` spark.sql.catalog.iceberg_catalog.io-impl: org.apache.iceberg.aws.s3.S3FileIO spark.sql.catalog.iceberg_catalog.s3.access-key-id: (redacted) spark.sql.catalog.iceberg_catalog.s3.endpoint: https://redacted spark.sql.catalog.iceberg_catalog.s3.secret-access-key: (redacted) ``` So, as you can see for the procedure I was supplying S3FileIO, and this is what caused this bug. As soon as I removed these 4 settings from my SparkSession conf, this bug went away, because instead of using S3FileIO rewrite_data_files procedure started using HadoopFileIO. @danielcweeks FYI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org