Re: timeout while running simple hadoop job

2010-05-12 Thread gabriele renzi
On Wed, May 12, 2010 at 5:46 PM, Johan Oskarsson wrote: > Looking over the code this is in fact an issue in 0.6. > It's fixed in trunk/0.7. Connections will be reused and closed properly, see > https://issues.apache.org/jira/browse/CASSANDRA-1017 for more details. > > We can either backport that

Re: timeout while running simple hadoop job

2010-05-12 Thread Johan Oskarsson
Looking over the code this is in fact an issue in 0.6. It's fixed in trunk/0.7. Connections will be reused and closed properly, see https://issues.apache.org/jira/browse/CASSANDRA-1017 for more details. We can either backport that patch or make at least close the connections properly in 0.6. Ca

Re: timeout while running simple hadoop job

2010-05-12 Thread Héctor Izquierdo
Have you checked your open file handler limit? You can do that by using "ulimit" in the shell. If it's too low, you will encounter the "too many open files" error. You can also see how many open handlers an application has with "lsof". Héctor Izquierdo On 12/05/10 17:00, gabriele renzi wrote:

Re: timeout while running simple hadoop job

2010-05-12 Thread gabriele renzi
On Wed, May 12, 2010 at 4:43 PM, Jonathan Ellis wrote: > On Wed, May 12, 2010 at 5:11 AM, gabriele renzi wrote: >> - is it possible that such errors show up on the client side as >> timeoutErrors when they could be reported better? > > No, if the node the client is talking to doesn't get a reply

Re: timeout while running simple hadoop job

2010-05-12 Thread Jonathan Ellis
On Wed, May 12, 2010 at 5:11 AM, gabriele renzi wrote: > - is it possible that such errors show up on the client side as > timeoutErrors when they could be reported better? No, if the node the client is talking to doesn't get a reply from the data node, there is no way for it to magically find ou

Re: timeout while running simple hadoop job

2010-05-12 Thread gabriele renzi
a follow up for anyone that may end up on this conversation again: I kept trying and neither changing the number of concurrent map tasks, nor the slice size helped. Finally, I found out a screw up in our logging system, which had forbidden us from noticing a couple of recurring errors in the logs

Re: timeout while running simple hadoop job

2010-05-07 Thread Joost Ouwerkerk
The number of map tasks for a job is a function of the InputFormat, which in the case of ColumnInputFormat is a function of the global number of keys in Cassandra. The number of concurrent maps being executed at any given time per TaskTracker (per node) is set by mapred.tasktracker.reduce.tasks.ma

Re: timeout while running simple hadoop job

2010-05-07 Thread Joseph Stein
you can manage the number of map tasks by node mapred.tasktracker.map.tasks.maximum=1 On Fri, May 7, 2010 at 9:53 AM, gabriele renzi wrote: > On Fri, May 7, 2010 at 2:44 PM, Jonathan Ellis wrote: >> Sounds like you need to configure Hadoop to not create a whole bunch >> of Map tasks at once >

Re: timeout while running simple hadoop job

2010-05-07 Thread gabriele renzi
On Fri, May 7, 2010 at 2:44 PM, Jonathan Ellis wrote: > Sounds like you need to configure Hadoop to not create a whole bunch > of Map tasks at once interesting, from a quick check it seems there are a dozen threads running. Yet , setNumMapTasks seems to be deprecated (together with JobConf) and

Re: timeout while running simple hadoop job

2010-05-07 Thread Matt Revelle
On May 7, 2010, at 9:40, gabriele renzi wrote: On Fri, May 7, 2010 at 2:53 PM, Matt Revelle wrote: re: not reporting, I thought this was not needed with the new mapred api (Mapper class vs Mapper interface), plus I can see that the mappers do work, report percentage and happily terminate

Re: timeout while running simple hadoop job

2010-05-07 Thread gabriele renzi
On Fri, May 7, 2010 at 2:53 PM, Matt Revelle wrote: > There's also the mapred.task.timeout property that can be tweaked.  But > reporting is the correct way to fix timeouts during execution. re: not reporting, I thought this was not needed with the new mapred api (Mapper class vs Mapper interf

Re: timeout while running simple hadoop job

2010-05-07 Thread gabriele renzi
On Fri, May 7, 2010 at 3:02 PM, Joost Ouwerkerk wrote: > Joseph, the stacktrace suggests that it's Thrift that's timing out, > not the Task. > > Gabriele, I believe that your problem is caused by too much load on > Cassandra.  Get_range_slices is presently an expensive operation. I > had some succ

Re: timeout while running simple hadoop job

2010-05-07 Thread Jonathan Ellis
The whole point is to parallelize to use the available capacity across multiple machines. If you go past that point (fairly easy when you have a single machine) then you're just contending for resources, not making things faster. On Fri, May 7, 2010 at 7:48 AM, Joost Ouwerkerk wrote: > Huh? Isn'

Re: timeout while running simple hadoop job

2010-05-07 Thread Joost Ouwerkerk
Joseph, the stacktrace suggests that it's Thrift that's timing out, not the Task. Gabriele, I believe that your problem is caused by too much load on Cassandra. Get_range_slices is presently an expensive operation. I had some success in reducing (although, it turns out, not eliminating) this prob

Re: timeout while running simple hadoop job

2010-05-07 Thread Matt Revelle
There's also the mapred.task.timeout property that can be tweaked. But reporting is the correct way to fix timeouts during execution. On May 7, 2010, at 8:49 AM, Joseph Stein wrote: > The problem could be that you are crunching more data than will be > completed within the interval expire setti

Re: timeout while running simple hadoop job

2010-05-07 Thread Joseph Stein
The problem could be that you are crunching more data than will be completed within the interval expire setting. In Hadoop you need to kind of tell the task tracker that you are still doing stuff which is done by setting status or incrementing counter on the Reporter object. http://allthingshadoo

Re: timeout while running simple hadoop job

2010-05-07 Thread Joost Ouwerkerk
Huh? Isn't that the whole point of using Map/Reduce? On Fri, May 7, 2010 at 8:44 AM, Jonathan Ellis wrote: > Sounds like you need to configure Hadoop to not create a whole bunch > of Map tasks at once > > On Fri, May 7, 2010 at 3:47 AM, gabriele renzi wrote: >> Hi everyone, >> >> I am trying to

Re: timeout while running simple hadoop job

2010-05-07 Thread Jonathan Ellis
Sounds like you need to configure Hadoop to not create a whole bunch of Map tasks at once On Fri, May 7, 2010 at 3:47 AM, gabriele renzi wrote: > Hi everyone, > > I am trying to develop a mapreduce job that does a simple > selection+filter on the rows in our store. > Of course it is mostly based

timeout while running simple hadoop job

2010-05-07 Thread gabriele renzi
Hi everyone, I am trying to develop a mapreduce job that does a simple selection+filter on the rows in our store. Of course it is mostly based on the WordCount example :) Sadly, while it seems the app runs fine on a test keyspace with little data, when run on a larger test index (but still on a