Re: [PR] AWS: Update S3 async client configurations and docs for analytics-accelerator-s3 [iceberg]

via GitHub Wed, 12 Mar 2025 09:32:52 -0700


jackye1995 commented on code in PR #12503:
URL: https://github.com/apache/iceberg/pull/12503#discussion_r1991879620



##########
docs/docs/aws.md:
##########
@@ -565,6 +565,29 @@ spark-sql --conf 
spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCata
 
 For more details on using S3 Acceleration, please refer to [Configuring fast, 
secure file transfers using Amazon S3 Transfer 
Acceleration](https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration.html).
 
+### S3 Analytics Accelerator
+
+The [Analytics Accelerator Library for Amazon 
S3](https://github.com/awslabs/analytics-accelerator-s3) helps you accelerate 
access to Amazon S3 data from your applications. This open-source solution 
reduces processing times and compute costs for your data analytics workloads.
+
+In order to enable S3 Analytics Accelerator Library to work in Iceberg, you 
can set the `s3.analytics-accelerator.enabled` catalog property to `true`. By 
default, this property is set to `false`.
+
+For example, to use S3 Analytics Accelerator with Spark 3.5, you can start the 
Spark SQL shell with:
+```
+spark-sql --conf 
spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
+    --conf 
spark.sql.catalog.my_catalog.warehouse=s3://my-bucket2/my/key/prefix \
+    --conf spark.sql.catalog.my_catalog.type=glue \
+    --conf 
spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
+    --conf spark.sql.catalog.my_catalog.s3.analytics-accelerator.enabled=true
+```
+
+Our library can work with either the [S3 CRT 
client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html)
 or the 
[S3AsyncClient](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3AsyncClient.html).
 We recommend that you use the S3 CRT client due to its enhanced connection 
pool management and [higher throughput on 
downloads](https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/).
+
+By default, the S3 CRT client is enabled to work in Iceberg with the following 
configuration:
+- `s3.crt.enabled=true`
+- `s3.crt.max-concurrency=500`
+
+For additional configuration options, you can use properties prefixed with 
`s3.analytics-accelerator.`. For more details, please refer to [S3 Analytics 
Accelerator 
Configuration](https://github.com/awslabs/analytics-accelerator-s3/blob/main/doc/CONFIGURATION.md).

Review Comment:
   we should just mention all config keys in the table above, including the 
prefix one to configure more properties.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] AWS: Update S3 async client configurations and docs for analytics-accelerator-s3 [iceberg]

Reply via email to