I'm writing a plugin for Nutch 2. When built, it creates a job file which is then executed via Hadoop.
My plugin uses a third party library that requires a File object pointing to a directory of files. I looked at DistributedCache, but I'm not sure how to use it to get a File object. On Thu, Oct 11, 2012 at 10:26 AM, Harsh J <[email protected]> wrote: > Hi Bai, > > What exactly do you mean by a 'job file' and have you considered using > DistributedCache, as detailed at > http://hadoop.apache.org/docs/stable/mapred_tutorial.html#DistributedCache > ? > > On Thu, Oct 11, 2012 at 7:44 PM, Bai Shen <[email protected]> wrote: > > I'm trying to reference a directory inside my job file from my code. I > > have a third party library that I need to pass a File object to which > > references the directory in the job file. > > > > How do I go about doing this? If I just do new File("dir") it looks for > > the directory on the client machine that I'm calling the job from instead > > of the directory in the actual job file itself. > > > > Thanks. > > > > -- > Harsh J >
