Thanks! So you invert the data and than walk through each inverted result. Good point! What do you think about prefixing each city-name with the index in the list?
This way you can say: London: 1_Moscow:2, 1_Paris:2, 2_Moscow:1, 2_Riga:4, 2_Paris:1, 3_Berlin:1... >From this list you can see that people are likely to visit moscow right after london at their first or second journey. This would maintain a strong order (whether that's good or bad depends on a real-world-scenario). Since your ideas gave me a good starting-point for realizing this job (I'll practice it), we can make the problem more heavy-weight, if you like? What happens to records that are too big to be processable by one node? Let's say from my above example of a strongly-ordered list one gets a billion combinations - way too much for one node (we assume that). What possibilities does Hadoop offer to deal with such things? Regards and many thanks for the insights, Em Am 19.07.2011 19:15, schrieb Steve Lewis: > Assume Joe visits Washington, London, Paris and Moscow > > You start with records like > Joe:Washington:20-Jan-2011 > Joe:London:14-Feb2011 > Joe:Paris :9-Mar-2011 > > You want > Joe: Washington, London, Paris and Moscow > > For the next step the person is irrelevant > you want > > > Washington: London:1, Paris:1 ,Moscow:1 > London: , Paris:1 Moscow:1 > Paris: Moscow:1 > The first say after a visit to Washington there was one visit to London, > one to Paris and one to Moscow > > > This can be combined with the one from Joe > > > Now suppose Bill visits London and Moscow > So he generates > London: Moscow:1 > > This can be combined with the one from Joe saying London: , Paris:1 and > Moscow:1 > to give > > London: , Paris:1 and Moscow:2 > > Now suppose Sue visits London and Riga and Paris > So she generates > London: , Paris:1,Riga 1 > > This can be combined with London: , Paris:1 and Moscow:2 to give > > London: , Paris:2 and Moscow:2,Riga 1 > > Note I can keep places in alphabetical order in the result > > > > On Tue, Jul 19, 2011 at 9:53 AM, Em <[email protected] > <mailto:[email protected]>> wrote: > > Hi Steven, > > thanks for your response! For the ease of use we can make those > assumptions you made - maybe this makes it much easier to help. Those > little extras are something for after solving the "easy" version of the > task. :) > > What do you mean with the following? > > > The second job takes Person : list of places and return for each place > > in the list consructs > > place : 1 | place after P : 1 | next place : 1 ... > > You mean something like that? > > Washington DC:1 > New York after Washington DC:1 > Miami after New York:1 > > I do not see the benefit for the result I like to get? > > The end-result should be something like that: > Washington DC => New York, Miami, Los Angeles > New York => Chicago, Seattle, San Francisco > > The point is, that one can see that persons that visited Washington DC > are likely to visit New York as the next place, Miami as the second and > L.A. as the third. > However, if I choose New York as my starting point, I can see that > persons that start their journey in New York (and maybe weren't in DC > before) are likely to visit Chicago, Seattle and San Francisco. Maybe > Los Angeles comes at the 10th position. > > Regards, > Em > > > > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 > 206-384-1340 (cell) > Skype lordjoe_com > >
