[ 
https://issues.apache.org/jira/browse/HADOOP-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127312#comment-15127312
 ] 

Sangjin Lee commented on HADOOP-12747:
--------------------------------------

{quote}
My solution (tested many times with production pipelines I own) using dir 
uploads is obviously geared to the former case and not the case of exploded 
archive as you can see from the value of application classpath. was just 
suggesting to automate it via this property or mapreduce.job.classpath.files.
{quote}

OK, could you elaborate a little on your proposal? Currently directories are 
already allowed in libjars. It's that the classpath on the task side is added 
as the directory name (i.e. {{directory}}) and not as a set of jars (i.e. 
{{directory/\*}}). Therefore jars are not recognized as part of the classpath. 
Does the proposal then entail adding the classpath as {{directory/*}} instead 
of {{directory}}? Strictly speaking, it would be a re-interpretation (or a 
change of behavior) for a directory entry for libjars. Today, if you add a 
directory of classfiles to libjars, that works correctly. I can imagine a 
scenario where someone submits a job out of a dev environment and includes a 
source build directory as a libjars argument. Again, I don't know whether it is 
an advertised feature/behavior, but at least I can see that working today, and 
if we change it this way we would re-interpret that to mean something else.

Also, what should be uploaded and localized? Should everything be uploaded or 
only jars be uploaded? Note that anything other than jars would not be used for 
the classpath.

{quote}
I don't know whether libjars (<comma separated list of jars>) is modeled after 
CLASSPATH. But I think there should be a separation of concerns: syntax vs how 
it's implemented.
{quote}

I agree completely. What I was arguing for is that it would be helpful for 
users if we adopt a consistent *syntax* between classpath and libjars. This 
patch is just one way of implementing that syntax. It could also be implemented 
by accepting {{directory/\*}} as a directory, uploading only jar files to hdfs, 
and adding {{directory/\*}} to the classpath. Again, in that case, IMO we still 
need to leave {{directory}} as libjars argument alone as it works today.


> support wildcard in libjars argument
> ------------------------------------
>
>                 Key: HADOOP-12747
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12747
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: util
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: HADOOP-12747.01.patch, HADOOP-12747.02.patch
>
>
> There is a problem when a user job adds too many dependency jars in their 
> command line. The HADOOP_CLASSPATH part can be addressed, including using 
> wildcards (\*). But the same cannot be done with the -libjars argument. Today 
> it takes only fully specified file paths.
> We may want to consider supporting wildcards as a way to help users in this 
> situation. The idea is to handle it the same way the JVM does it: \* expands 
> to the list of jars in that directory. It does not traverse into any child 
> directory.
> Also, it probably would be a good idea to do it only for libjars (i.e. don't 
> do it for -files and -archives).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to