HonahX commented on code in PR #12503:
URL: https://github.com/apache/iceberg/pull/12503#discussion_r1992593830


##########
docs/docs/aws.md:
##########
@@ -565,6 +565,81 @@ spark-sql --conf 
spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCata
 
 For more details on using S3 Acceleration, please refer to [Configuring fast, 
secure file transfers using Amazon S3 Transfer 
Acceleration](https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration.html).
 
+### S3 Analytics Accelerator
+
+The [Analytics Accelerator Library for Amazon 
S3](https://github.com/awslabs/analytics-accelerator-s3) helps you accelerate 
access to Amazon S3 data from your applications. This open-source solution 
reduces processing times and compute costs for your data analytics workloads.
+
+In order to enable S3 Analytics Accelerator Library to work in Iceberg, you 
can set the `s3.analytics-accelerator.enabled` catalog property to `true`. By 
default, this property is set to `false`.
+
+For example, to use S3 Analytics Accelerator with Spark, you can start the 
Spark SQL shell with:
+```
+spark-sql --conf 
spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
+    --conf 
spark.sql.catalog.my_catalog.warehouse=s3://my-bucket2/my/key/prefix \
+    --conf spark.sql.catalog.my_catalog.type=glue \
+    --conf 
spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
+    --conf spark.sql.catalog.my_catalog.s3.analytics-accelerator.enabled=true
+```
+
+The Analytics Accelerator Library can work with either the [S3 CRT 
client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html)
 or the 
[S3AsyncClient](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3AsyncClient.html).
 The library recommends that you use the S3 CRT client due to its enhanced 
connection pool management and [higher throughput on 
downloads](https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/).
+
+##### Client Configuration

Review Comment:
   ```suggestion
   #### Client Configuration
   ```
   Shall we use H4 heading here since the `S3 Analytics Accelerator` is H3?



##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java:
##########
@@ -98,6 +100,17 @@ public class S3FileIOProperties implements Serializable {
 
   public static final boolean S3_CRT_ENABLED_DEFAULT = true;
 
+  /** This property is used to specify the max concurrency for CRT Async 
clients. */
+  public static final String S3_CRT_MAX_CONCURRENCY = "s3.crt.max-concurrency";
+
+  /**
+   * To fully benefit from the analytics-accelerator-s3 library where this S3 
CRT client is used, it
+   * is recommended to initialize with higher concurrency.
+   *
+   * <p>For more details, see: 
https://github.com/awslabs/analytics-accelerator-s3
+   */
+  public static final int S3_CRT_MAX_CONCURRENCY_DEFAULT = 500;

Review Comment:
   Could we add a test for the new property? Currently we have 
   
https://github.com/apache/iceberg/blob/3dba6afb789a420373e44ac29ecdef866bd7ebee/aws/src/test/java/org/apache/iceberg/aws/s3/TestS3FileIOProperties.java#L55
   for default values
   
   and 
   
   
https://github.com/apache/iceberg/blob/3dba6afb789a420373e44ac29ecdef866bd7ebee/aws/src/test/java/org/apache/iceberg/aws/s3/TestS3FileIOProperties.java#L128
   
   for custom values



##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java:
##########
@@ -1098,6 +1150,10 @@ public <T extends S3ClientBuilder> void 
applyUserAgentConfigurations(T builder)
             .build());
   }
 
+  public S3CrtAsyncClientBuilder 
applyS3CrtConfigurations(S3CrtAsyncClientBuilder builder) {
+    return builder.maxConcurrency(s3CrtMaxConcurrency());
+  }
+

Review Comment:
   Could we add mock test similar to 
https://github.com/apache/iceberg/blob/3dba6afb789a420373e44ac29ecdef866bd7ebee/aws/src/test/java/org/apache/iceberg/aws/s3/TestS3FileIOProperties.java#L487-L495
 ?



##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java:
##########
@@ -98,6 +100,17 @@ public class S3FileIOProperties implements Serializable {
 
   public static final boolean S3_CRT_ENABLED_DEFAULT = true;
 
+  /** This property is used to specify the max concurrency for CRT Async 
clients. */

Review Comment:
   Currently, we have inconsistencies in naming: `CRT Async Client`, `S3 CRT 
client`, `CRT client`, and `S3 CRT Async client`. How about unifying the 
terminology in the docstring for the CRT-based S3 client? Maybe we standardize 
it as `S3 CRT client` everywhere?



##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java:
##########
@@ -1004,6 +1026,36 @@ public <T extends S3ClientBuilder> void 
applyEndpointConfigurations(T builder) {
     }
   }
 
+  /**
+   * Override the endpoint for an S3 async client
+   *
+   * <p>Sample usage:
+   *
+   * <pre>
+   *     
S3AsyncClient.builder().applyMutation(s3FileIOProperties::applyEndpointConfigurations)
+   * </pre>
+   */
+  public <T extends S3AsyncClientBuilder> void applyEndpointConfigurations(T 
builder) {
+    if (endpoint != null) {
+      builder.endpointOverride(URI.create(endpoint));
+    }
+  }
+
+  /**
+   * Override the endpoint for an S3 CRT async client
+   *
+   * <p>Sample usage:
+   *
+   * <pre>
+   *     
S3AsyncClient.crtBuilder().applyMutation(s3FileIOProperties::applyEndpointConfigurations)
+   * </pre>
+   */
+  public <T extends S3CrtAsyncClientBuilder> void 
applyEndpointConfigurations(T builder) {
+    if (endpoint != null) {
+      builder.endpointOverride(URI.create(endpoint));
+    }
+  }

Review Comment:
   Could we add mock test for S3AsyncClient and S3SrcAsyncClientBuilder in
   
https://github.com/apache/iceberg/blob/3dba6afb789a420373e44ac29ecdef866bd7ebee/aws/src/test/java/org/apache/iceberg/aws/s3/TestS3FileIOProperties.java#L487-L494



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to