paulpaul1076 commented on issue #9679:
URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1970719190
@RussellSpitzer thanks a lot for helping with this. Want to give a bit more
details (we discussed this with Russell in iceberg slack).
This is how I would load my catalog for the RewriteDataFiles action (**Tried
this with Nessie and Hive, so, what kind of catalog you use does not matter**):
```
HiveCatalog catalog = new HiveCatalog();
catalog.setConf(spark.sparkContext().hadoopConfiguration());
Map<String, String> properties = new HashMap<>();
properties.put("warehouse", "s3a://obs-zdp-warehouse-stage-mz/");
properties.put("uri", "thrift://******");
catalog.initialize("hive", properties);
String tableName = args[0];
String[] schemaAndTable = tableName.split("\\.");
TableIdentifier tableId = TableIdentifier.of(schemaAndTable);
Table table = catalog.loadTable(tableId);
```
For the rewrite_data_files procedure this is how the catalog is loaded:
```
Table table = Spark3Util.loadIcebergTable(spark, tableNameStr);
```
As you can see in the case of RewriteDataFiles action I would construct the
catalog myself from scratch and supply all the settings manually. Whereas, for
the rewrite_data_files procedure it would take all its configs from
SparkSession settings:
```
spark.sql.catalog.iceberg_catalog.io-impl: org.apache.iceberg.aws.s3.S3FileIO
spark.sql.catalog.iceberg_catalog.s3.access-key-id: (redacted)
spark.sql.catalog.iceberg_catalog.s3.endpoint: https://redacted
spark.sql.catalog.iceberg_catalog.s3.secret-access-key: (redacted)
```
So, as you can see for the procedure I was supplying S3FileIO, and this is
what caused this bug. As soon as I removed these 4 settings from my
SparkSession conf, this bug went away, because instead of using S3FileIO
rewrite_data_files procedure started using HadoopFileIO.
@danielcweeks FYI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]