[
https://issues.apache.org/jira/browse/HADOOP-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127312#comment-15127312
]
Sangjin Lee commented on HADOOP-12747:
--------------------------------------
{quote}
My solution (tested many times with production pipelines I own) using dir
uploads is obviously geared to the former case and not the case of exploded
archive as you can see from the value of application classpath. was just
suggesting to automate it via this property or mapreduce.job.classpath.files.
{quote}
OK, could you elaborate a little on your proposal? Currently directories are
already allowed in libjars. It's that the classpath on the task side is added
as the directory name (i.e. {{directory}}) and not as a set of jars (i.e.
{{directory/\*}}). Therefore jars are not recognized as part of the classpath.
Does the proposal then entail adding the classpath as {{directory/*}} instead
of {{directory}}? Strictly speaking, it would be a re-interpretation (or a
change of behavior) for a directory entry for libjars. Today, if you add a
directory of classfiles to libjars, that works correctly. I can imagine a
scenario where someone submits a job out of a dev environment and includes a
source build directory as a libjars argument. Again, I don't know whether it is
an advertised feature/behavior, but at least I can see that working today, and
if we change it this way we would re-interpret that to mean something else.
Also, what should be uploaded and localized? Should everything be uploaded or
only jars be uploaded? Note that anything other than jars would not be used for
the classpath.
{quote}
I don't know whether libjars (<comma separated list of jars>) is modeled after
CLASSPATH. But I think there should be a separation of concerns: syntax vs how
it's implemented.
{quote}
I agree completely. What I was arguing for is that it would be helpful for
users if we adopt a consistent *syntax* between classpath and libjars. This
patch is just one way of implementing that syntax. It could also be implemented
by accepting {{directory/\*}} as a directory, uploading only jar files to hdfs,
and adding {{directory/\*}} to the classpath. Again, in that case, IMO we still
need to leave {{directory}} as libjars argument alone as it works today.
> support wildcard in libjars argument
> ------------------------------------
>
> Key: HADOOP-12747
> URL: https://issues.apache.org/jira/browse/HADOOP-12747
> Project: Hadoop Common
> Issue Type: New Feature
> Components: util
> Reporter: Sangjin Lee
> Assignee: Sangjin Lee
> Attachments: HADOOP-12747.01.patch, HADOOP-12747.02.patch
>
>
> There is a problem when a user job adds too many dependency jars in their
> command line. The HADOOP_CLASSPATH part can be addressed, including using
> wildcards (\*). But the same cannot be done with the -libjars argument. Today
> it takes only fully specified file paths.
> We may want to consider supporting wildcards as a way to help users in this
> situation. The idea is to handle it the same way the JVM does it: \* expands
> to the list of jars in that directory. It does not traverse into any child
> directory.
> Also, it probably would be a good idea to do it only for libjars (i.e. don't
> do it for -files and -archives).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)