Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

via GitHub Thu, 29 Feb 2024 01:17:00 -0800


paulpaul1076 commented on issue #9679:
URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1970719190


   @RussellSpitzer thanks a lot for helping with this. Want to give a bit more 
details (we discussed this with Russell in iceberg slack).
   
   This is how I would load my catalog for the RewriteDataFiles action (**Tried 
this with Nessie and Hive, so, what kind of catalog you use does not matter**):
   
   ```
   HiveCatalog catalog = new HiveCatalog();
   catalog.setConf(spark.sparkContext().hadoopConfiguration());
   
   Map<String, String> properties = new HashMap<>();
   properties.put("warehouse", "s3a://obs-zdp-warehouse-stage-mz/");
   properties.put("uri", "thrift://******");
   
   catalog.initialize("hive", properties);
   
   String tableName = args[0];
   String[] schemaAndTable = tableName.split("\\.");
   TableIdentifier tableId = TableIdentifier.of(schemaAndTable);
   
   Table table = catalog.loadTable(tableId);
   ```
   
   For the rewrite_data_files procedure this is how the catalog is loaded:
   
   ```
   Table table = Spark3Util.loadIcebergTable(spark, tableNameStr);
   ```
   
   As you can see in the case of RewriteDataFiles action I would construct the 
catalog myself from scratch and supply all the settings manually. Whereas, for 
the rewrite_data_files procedure it would take all its configs from 
SparkSession settings:
   
   ```
   spark.sql.catalog.iceberg_catalog.io-impl: org.apache.iceberg.aws.s3.S3FileIO
   spark.sql.catalog.iceberg_catalog.s3.access-key-id: (redacted)
   spark.sql.catalog.iceberg_catalog.s3.endpoint: https://redacted
   spark.sql.catalog.iceberg_catalog.s3.secret-access-key: (redacted)
   ```
   
   So, as you can see for the procedure I was supplying S3FileIO, and this is 
what caused this bug. As soon as I removed these 4 settings from my 
SparkSession conf, this bug went away, because instead of using S3FileIO 
rewrite_data_files procedure started using HadoopFileIO.
   
   @danielcweeks FYI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

Reply via email to