[ 
https://issues.apache.org/jira/browse/HADOOP-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130957#comment-15130957
 ] 

Sangjin Lee commented on HADOOP-12747:
--------------------------------------

Thanks for the clarification [~jira.shegalov].

I can summarize the pros and cons of both approaches as follows.
(1) expand wildcard at GenericOptionsParser
- pro: provide a consistent libjars syntax with classpaths
- pro: no impact on the existing behavior
- pro: only needed files (i.e. jars) are uploaded to hdfs and localized
- con: increases the size of the configuration

(2) add the wildcard notation on the task classpath
- pro: smallest amount of changes; leverages existing functionality of 
accepting and uploading a directory
- pro: configuration stays slim
- con: would upload and localize files unnecessarily if only the jars were 
supposed to be used
- con: need to re-interpret or deprecate (minor) behavior, such as adding 
libjar entries to the client classpath and allowing directories as a set of 
classfiles

In terms of the amount of changes, I believe the changes for approach (1) are 
still quite small. The only impactful change is expanding the wildcard in 
GenericOptionsParser. All other changes are refactoring to use a common 
implementation for that expansion and unit test changes.

My proposal would be to proceed with approach (1). IMO there is a lot of value 
in having libjars consistent with the classpath syntax. And if we can provide 
that support with an equally small change, I don't see a reason not to do it.

I'd like to hear your thoughts and others too!

> support wildcard in libjars argument
> ------------------------------------
>
>                 Key: HADOOP-12747
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12747
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: util
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: HADOOP-12747.01.patch, HADOOP-12747.02.patch
>
>
> There is a problem when a user job adds too many dependency jars in their 
> command line. The HADOOP_CLASSPATH part can be addressed, including using 
> wildcards (\*). But the same cannot be done with the -libjars argument. Today 
> it takes only fully specified file paths.
> We may want to consider supporting wildcards as a way to help users in this 
> situation. The idea is to handle it the same way the JVM does it: \* expands 
> to the list of jars in that directory. It does not traverse into any child 
> directory.
> Also, it probably would be a good idea to do it only for libjars (i.e. don't 
> do it for -files and -archives).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to