Thanks!

So you invert the data and than walk through each inverted result.
Good point!
What do you think about prefixing each city-name with the index in the list?

This way you can say:
London: 1_Moscow:2, 1_Paris:2, 2_Moscow:1, 2_Riga:4, 2_Paris:1,
3_Berlin:1...

>From this list you can see that people are likely to visit moscow right
after london at their first or second journey. This would maintain a
strong order (whether that's good or bad depends on a real-world-scenario).

Since your ideas gave me a good starting-point for realizing this job
(I'll practice it), we can make the problem more heavy-weight, if you like?

What happens to records that are too big to be processable by one node?
Let's say from my above example of a strongly-ordered list one gets a
billion combinations - way too much for one node (we assume that).
What possibilities does Hadoop offer to deal with such things?

Regards and many thanks for the insights,
Em


Am 19.07.2011 19:15, schrieb Steve Lewis:
> Assume Joe visits Washington, London, Paris and Moscow
> 
> You start with records like
> Joe:Washington:20-Jan-2011
> Joe:London:14-Feb2011
> Joe:Paris :9-Mar-2011
> 
> You want
> Joe: Washington, London, Paris and Moscow
> 
> For the next step the person is irrelevant
> you want 
> 
> 
> Washington:  London:1, Paris:1 ,Moscow:1
>  London: , Paris:1  Moscow:1
>  Paris:   Moscow:1
> The first say after a visit to Washington there was one visit to London,
> one to Paris and one to Moscow
> 
> 
> This can be combined with the one from Joe
> 
> 
> Now suppose Bill visits London and Moscow
> So he generates 
> London:    Moscow:1
> 
> This can be combined with the one from Joe saying  London: , Paris:1 and
> Moscow:1
>  to give
> 
>  London: , Paris:1 and Moscow:2
> 
> Now suppose Sue visits London and  Riga and Paris
> So she generates 
> London: , Paris:1,Riga 1
> 
> This can be combined with  London: , Paris:1 and Moscow:2 to give
> 
> London: , Paris:2 and Moscow:2,Riga 1
> 
> Note I can keep places in alphabetical order in the result
> 
> 
> 
> On Tue, Jul 19, 2011 at 9:53 AM, Em <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi Steven,
> 
>     thanks for your response! For the ease of use we can make those
>     assumptions you made - maybe this makes it much easier to help. Those
>     little extras are something for after solving the "easy" version of the
>     task. :)
> 
>     What do you mean with the following?
> 
>     > The second job takes Person : list of places and return for each place
>     > in the list consructs
>     > place : 1 | place after P : 1 | next place : 1 ...
> 
>     You mean something like that?
> 
>     Washington DC:1
>     New York after Washington DC:1
>     Miami after New York:1
> 
>     I do not see the benefit for the result I like to get?
> 
>     The end-result should be something like that:
>     Washington DC => New York, Miami, Los Angeles
>     New York => Chicago, Seattle, San Francisco
> 
>     The point is, that one can see that persons that visited Washington DC
>     are likely to visit New York as the next place, Miami as the second and
>     L.A. as the third.
>     However, if I choose New York as my starting point, I can see that
>     persons that start their journey in New York (and maybe weren't in DC
>     before) are likely to visit Chicago, Seattle and San Francisco. Maybe
>     Los Angeles comes at the 10th position.
> 
>     Regards,
>     Em
> 
> 
> 
> 
> -- 
> Steven M. Lewis PhD
> 4221 105th Ave NE
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Skype lordjoe_com
> 
> 

Reply via email to