[
https://issues.apache.org/jira/browse/HADOOP-17197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175017#comment-17175017
]
Sahil Takiar commented on HADOOP-17197:
---------------------------------------
I'm not proposing that S3A should stop using the aws-java-sdk-bundle, or to
stop relying on shading, simply that we should try to find a way to decrease
the size of the dependency. There are multiple ways to do so:
* Use the shade plugin to create an S3A specific version of
aws-java-sdk-bundle and include only the AWS services that S3A actually needs
** We can use the shade plugin to remove service SDKs from aws-java-sdk-bundle
that S3A would never conceivably use - e.g. the service SDK in the bundle is
for Amazon Interactive Video Service:
[https://aws.amazon.com/about-aws/whats-new/2020/07/introducing-amazon-ivs/]
** An example of how to do this using Maven can be found here:
[https://github.com/apache/impala/blob/master/shaded-deps/pom.xml]
** The shade plugin can use an include or an exclude pattern
* Use the shade plugin to publish an S3A Uber-jar that contains S3A + shaded
SDK dependencies for the AWS services that S3A actually needs
** This would be in addition to the existing S3A jar
> Decrease size of s3a dependencies
> ---------------------------------
>
> Key: HADOOP-17197
> URL: https://issues.apache.org/jira/browse/HADOOP-17197
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Sahil Takiar
> Priority: Major
>
> S3A currently has a dependency on the aws-java-sdk-bundle, which includes the
> SDKs for all AWS services. The jar file for the current version is about 120
> MB, but continues to grow (the latest is about 170 MB). Organic growth is
> expected as more and more AWS services are created.
> The aws-java-sdk-bundle jar file is shaded as well, so it includes all
> transitive dependencies.
> It would be nice if S3A could depend on smaller jar files in order to
> decrease the size of jar files pulled in transitively by clients. Decreasing
> the size of dependencies is particularly important for Docker files, where
> image pull times can be affected by image size.
> One solution here would be for S3A to publish its own shaded jar which
> includes the SDKs for all needed AWS Services (e.g. S3, DynamoDB, etc.) along
> with the transitive dependencies for the individual SDKs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]