Re: [PR] Core: HadoopFileIO to support bulk delete through the Hadoop Filesystem APIs [iceberg]

via GitHub Mon, 17 Mar 2025 19:47:19 -0700


danielcweeks commented on code in PR #10233:
URL: https://github.com/apache/iceberg/pull/10233#discussion_r1999636597



##########
docs/docs/aws.md:
##########
@@ -669,6 +665,63 @@ Users can use catalog properties to override the defaults. 
For example, to confi
 --conf spark.sql.catalog.my_catalog.http-client.apache.max-connections=5
 ```
 
+## Using the Hadoop S3A Connector
+
+The Apache Hadoop S3A Connector is an alternaive to S3FileIO. It:
+* Uses an AWS v2 client qualified across a broad set of applications, 
including Apache Spark, Apache Hive, Apache HBase and more. This may lag the 
Iceberg artifacts.
+* Contains detection and recovery for S3 failures beyond that in the AWS SDK 
-recovery added "one support call at a time".
+* Supports scatter/gather IO "Vector IO" for high-performance Parquet reads.

Review Comment:
   Is vectored IO supported in the Iceberg read path?  I wouldn't callout 
anything that isn't supported natively.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: HadoopFileIO to support bulk delete through the Hadoop Filesystem APIs [iceberg]

Reply via email to