Which api are you using? They changed between 0.18 and 0.20.this is the more recent version On May 8, 2012 3:55 AM, "sharat attupurath" <[email protected]> wrote:
> Hi Steve, > > I tried using the NShotInputFormat, but after setting it as the > InputFormat class and running my mapreduce job I get the following error : > > Exception in thread "main" java.lang.RuntimeException: class > tsphadoop.NShotInputFormat not org.apache.hadoop.mapred.InputFormat > at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:915) > at org.apache.hadoop.mapred.JobConf.setInputFormat(JobConf.java:590) > at tsphadoop.TspHadoop.main(TspHadoop.java:40) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > Am i missing something? Is there anything else I should know about using > NShotInputFormat? > > Thanks and regards > > Sharat > > ------------------------------ > Date: Mon, 7 May 2012 09:24:05 -0700 > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: [email protected] > To: [email protected] > > Fair enough - I write a lot of InputFormats since for most of my problems > a line of text is not the proper unit - > I read fasta files - read lines intil you hit a line starting with > and > xml fragments - read until you hit a closing tag > > > On Mon, May 7, 2012 at 9:03 AM, GUOJUN Zhu <[email protected]>wrote: > > > The default FileInputformat split the file according to the size. If you > use line text data, the TextFileInputformat respects the line structure for > input. We got splits as small as a few KBs. The file split is a tricky > business, especially when you want it to respect your logical boundary. It > is better to use the existing battle-test code than invent your own wheel. > > Zhu, Guojun > Modeling Sr Graduate > 571-3824370 > [email protected] > Financial Engineering > Freddie Mac > > > > *Steve Lewis <[email protected]>* 05/07/2012 11:17 AM > Please respond to > [email protected] > > To > [email protected] > cc > Subject > Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop > > > > > Yes but it is the job of the InputFormat code to implement the behavior - > it is not necessary to do so and in other cases I choose to create more > mappers when the mapper has a lot of work > > On Mon, May 7, 2012 at 7:54 AM, GUOJUN Zhu > <*[email protected]*<[email protected]>> > wrote: > > We are using old API of 0.20. I think when you set "mapred.reduce.tasks" > with certain number N and use fileinputformat, the default behavior is that > any file will be split into that number, N, each split smaller than the > default block size. Of course, other restriction, such as > "mapred.min.split.size" cannot be set too large (default is as small as > possible I think). > > Zhu, Guojun > Modeling Sr Graduate* > **571-3824370** > **[email protected]* <[email protected]> > Financial Engineering > Freddie Mac > > > > > *sharat attupurath <**[email protected]* <[email protected]>*>* > 05/05/2012 11:37 AM > > > Please respond to* > **[email protected]* <[email protected]> > > To > <*[email protected]* <[email protected]>> > cc > Subject > RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop > > > > > > > Since the input files are very small, the default input formats in Hadoop > all generate just a single InputSplit, so only a single map task is > executed, and we wont have much parallelism. > > I was thinking of writing an InputFormat that would read the whole file as > an InputSplit and replicate this input split n times (where n would be the > number of ants in a single stage) so that we'll have n mappers. > Also I want the input format to return the value as the adjacency matrix > of the graph (calculating it from the coordinates in the input file). > > But I can't find a way to do this. Is it possible? Or is it better to just > have the input as Text and create the adjacency matrix in the mappers? > > ------------------------------ > Date: Sat, 5 May 2012 08:20:34 -0700 > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: *[email protected]* <[email protected]> > To: *[email protected]* <[email protected]> > > yes - if you know how you can put it in distributed cache or if it is > small put in as a String in the config or have all InputFormats read it > from somewhere > > On Sat, May 5, 2012 at 8:08 AM, sharat attupurath > <*[email protected]*<[email protected]>> > wrote: > I looked at both the files. in AbstractNShotInputFormat it is mentioned > that this input format does not read from files. My input is in a text > file. I want the whole file as a single record. So is it enough if i just > copy the contents of the file and return it as a string from > getValueFromIndex() ? > > ------------------------------ > Date: Fri, 4 May 2012 13:15:46 -0700 > > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: *[email protected]* <[email protected]> > To: *[email protected]* <[email protected]> > > Look at NShotInputFormat > call setNumberKeys to set the number of ants then > change the method getValueFromIndex to return the text representing the > original problem to each mapper - > what happens in the second and third jobs is an exercise but the saved > test needs to have pheremones and the original problem > > > On Fri, May 4, 2012 at 9:54 AM, sharat attupurath > <*[email protected]*<[email protected]>> > wrote: > Hi, > > Thanks a lot for the quick reply sir! We are new to Apache Hadoop and > haven't yet understood it properly yet. Can you please elaborate on how we > can have multiple stages of mapreduce jobs for combining the trails as you > have mentioned? > > We have been trying to find out how to write a custom splitter and almost > all online resources tell to subclass the FileInputFormat and write only a > custom RecordReader? Will it be possible to generate our splits in that way? > > Regards > > Sharat > > ------------------------------ > Date: Fri, 4 May 2012 09:22:51 -0700 > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: *[email protected]* <[email protected]> > To: *[email protected]* <[email protected]> > > > > > On Fri, May 4, 2012 at 9:06 AM, sharat attupurath > <*[email protected]*<[email protected]>> > wrote: > Hi, > > We are trying to parallelize the ant colony optimization algorithm for TSP > over hadoop and are facing some issues. We are using TSPLIB as input files. > The input is a text file containing eucledian coordinates of the cities - > first column is city number and the next two columns contain the x and y > coordinates respectively. > > What we intend to do is to take input from this single file, send copies > of it to multiple mappers (each mapper acts like the ant in the algorithm), > each mapper works on its input to find its own TSP solution that it outputs > and finally the reducer outputs the smallest tour found by the mappers. > Hope we are in the right track. Here are the issues: > > 1) Since the input file is small, we need to force hadoop to fire up > multiple map tasks by replicating the input. How can we make an InputSplit > of the whole file and replicate it so that the input can be sent to > multiple mappers? > Write a custom splitter sending the same data to all mappers - the only > critical criteria is the number of "Ants" > > 2) the algorithm uses a shared pheromone array and each mapper needs to > read and write data from this. How can we share the pheromone data across > the mappers. > You cannot share data across mappers and should not attempt to do so - > Better to use the reducer(s) to combine trails combine first pass trails > and > then pass the combined trails to another mapreduce job - this with the > original problem plus the current pheremone trails > > Hope the questions are clear enough. Any help would be greatly > appreciated. > > Thank you > > Regards > > Sharat > > > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 * > **206-384-1340* (cell) > Skype lordjoe_com > > > > > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 * > **206-384-1340* (cell) > Skype lordjoe_com > > > > > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 * > **206-384-1340* (cell) > Skype lordjoe_com > > > > > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 > 206-384-1340 (cell) > Skype lordjoe_com > > > > > > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 > 206-384-1340 (cell) > Skype lordjoe_com > > >
