kkrugler opened a new issue #6492:
URL: https://github.com/apache/incubator-pinot/issues/6492


   Currently the code creates a tarball of the plugin directory inside of the 
Pinot distribution directory, and then calls `job.addCacheArchive(file://<path 
to tarball>`. This won't work, as all that this call does is store the path in 
the JobConf. On the slaves, this path is used as the source for copying files 
to slaves, but that `file://xxx` path doesn't exist.
   
   The Hadoop [DistributedCache 
documentation](https://hadoop.apache.org/docs/r2.8.3/api/org/apache/hadoop/filecache/DistributedCache.html)
 says:
   
   > Applications specify the files, via urls (hdfs:// or http://) to be cached 
via the JobConf. The DistributedCache assumes that the files specified via urls 
are already present on the FileSystem at the path specified by the url and are 
accessible by every machine in the cluster.
   
   So the `HadoopSegmentGenerationJobRunner` needs to copy files to HDFS, and 
set the distributed cache path to that location. There are some options for the 
location of where to copy these files. If you use the standard Hadoop command 
line `-files xxx` parameter (as an example), then the standard Hadoop tool 
framework will copy the file(s) to a job-specific directory inside of the 
"staging" directory. So we could try to leverage that same location. But since 
Pinot already requires a staging directory be specified in the job spec file, 
and this has to be in HDFS for a distributed job, we could use an explicit 
sub-dir within that directory.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to