danielcweeks commented on code in PR #10233: URL: https://github.com/apache/iceberg/pull/10233#discussion_r1999643749
########## docs/docs/aws.md: ########## @@ -669,6 +665,63 @@ Users can use catalog properties to override the defaults. For example, to confi --conf spark.sql.catalog.my_catalog.http-client.apache.max-connections=5 ``` +## Using the Hadoop S3A Connector + +The Apache Hadoop S3A Connector is an alternaive to S3FileIO. It: +* Uses an AWS v2 client qualified across a broad set of applications, including Apache Spark, Apache Hive, Apache HBase and more. This may lag the Iceberg artifacts. +* Contains detection and recovery for S3 failures beyond that in the AWS SDK -recovery added "one support call at a time". +* Supports scatter/gather IO "Vector IO" for high-performance Parquet reads. +* Supports Amazon S3 Express One Zone storage, FIPS endpoints, Client-Side encryption S3 Access Points and S3 Access Grants. +* Supports OpenSSL as an optional TLS transport layer -for tangible performance improvements over the JDK implementation. +* Includes [auditing via the S3 Server Logs](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/auditing.html), which can be used to answer important questions such as "who deleted all the files?" and "which job is triggering throttling?". +* Collects [client-side statistics](https://apachecon.com/acasia2022/sessions/bigdata-1191.html) for identification of performance and connectivity issues. +* Note: it does not support S3 Dual Stack, S3 Acceleration or S3 Tags. + +To use the S3A Connector, here are the instructions: + +1. To store data using S3A, specify the `warehouse` catalog property to be an S3A path, e.g. `s3a://my-bucket/my-warehouse` Review Comment: ```suggestion To store data using S3A, specify the `warehouse` catalog property to be an S3A path, e.g. `s3://my-bucket/my-warehouse` ``` Prefer using the `s3` scheme as it's more universal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org