[
https://issues.apache.org/jira/browse/HADOOP-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130957#comment-15130957
]
Sangjin Lee commented on HADOOP-12747:
--------------------------------------
Thanks for the clarification [~jira.shegalov].
I can summarize the pros and cons of both approaches as follows.
(1) expand wildcard at GenericOptionsParser
- pro: provide a consistent libjars syntax with classpaths
- pro: no impact on the existing behavior
- pro: only needed files (i.e. jars) are uploaded to hdfs and localized
- con: increases the size of the configuration
(2) add the wildcard notation on the task classpath
- pro: smallest amount of changes; leverages existing functionality of
accepting and uploading a directory
- pro: configuration stays slim
- con: would upload and localize files unnecessarily if only the jars were
supposed to be used
- con: need to re-interpret or deprecate (minor) behavior, such as adding
libjar entries to the client classpath and allowing directories as a set of
classfiles
In terms of the amount of changes, I believe the changes for approach (1) are
still quite small. The only impactful change is expanding the wildcard in
GenericOptionsParser. All other changes are refactoring to use a common
implementation for that expansion and unit test changes.
My proposal would be to proceed with approach (1). IMO there is a lot of value
in having libjars consistent with the classpath syntax. And if we can provide
that support with an equally small change, I don't see a reason not to do it.
I'd like to hear your thoughts and others too!
> support wildcard in libjars argument
> ------------------------------------
>
> Key: HADOOP-12747
> URL: https://issues.apache.org/jira/browse/HADOOP-12747
> Project: Hadoop Common
> Issue Type: New Feature
> Components: util
> Reporter: Sangjin Lee
> Assignee: Sangjin Lee
> Attachments: HADOOP-12747.01.patch, HADOOP-12747.02.patch
>
>
> There is a problem when a user job adds too many dependency jars in their
> command line. The HADOOP_CLASSPATH part can be addressed, including using
> wildcards (\*). But the same cannot be done with the -libjars argument. Today
> it takes only fully specified file paths.
> We may want to consider supporting wildcards as a way to help users in this
> situation. The idea is to handle it the same way the JVM does it: \* expands
> to the list of jars in that directory. It does not traverse into any child
> directory.
> Also, it probably would be a good idea to do it only for libjars (i.e. don't
> do it for -files and -archives).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)