On Wed, May 12, 2010 at 5:46 PM, Johan Oskarsson wrote:
> Looking over the code this is in fact an issue in 0.6.
> It's fixed in trunk/0.7. Connections will be reused and closed properly, see
> https://issues.apache.org/jira/browse/CASSANDRA-1017 for more details.
>
> We can either backport that
Looking over the code this is in fact an issue in 0.6.
It's fixed in trunk/0.7. Connections will be reused and closed properly, see
https://issues.apache.org/jira/browse/CASSANDRA-1017 for more details.
We can either backport that patch or make at least close the connections
properly in 0.6. Ca
Have you checked your open file handler limit? You can do that by using
"ulimit" in the shell. If it's too low, you will encounter the "too many
open files" error. You can also see how many open handlers an
application has with "lsof".
Héctor Izquierdo
On 12/05/10 17:00, gabriele renzi wrote:
On Wed, May 12, 2010 at 4:43 PM, Jonathan Ellis wrote:
> On Wed, May 12, 2010 at 5:11 AM, gabriele renzi wrote:
>> - is it possible that such errors show up on the client side as
>> timeoutErrors when they could be reported better?
>
> No, if the node the client is talking to doesn't get a reply
On Wed, May 12, 2010 at 5:11 AM, gabriele renzi wrote:
> - is it possible that such errors show up on the client side as
> timeoutErrors when they could be reported better?
No, if the node the client is talking to doesn't get a reply from the
data node, there is no way for it to magically find ou
a follow up for anyone that may end up on this conversation again:
I kept trying and neither changing the number of concurrent map tasks,
nor the slice size helped.
Finally, I found out a screw up in our logging system, which had
forbidden us from noticing a couple of recurring errors in the logs
The number of map tasks for a job is a function of the InputFormat,
which in the case of ColumnInputFormat is a function of the global
number of keys in Cassandra. The number of concurrent maps being
executed at any given time per TaskTracker (per node) is set by
mapred.tasktracker.reduce.tasks.ma
you can manage the number of map tasks by node
mapred.tasktracker.map.tasks.maximum=1
On Fri, May 7, 2010 at 9:53 AM, gabriele renzi wrote:
> On Fri, May 7, 2010 at 2:44 PM, Jonathan Ellis wrote:
>> Sounds like you need to configure Hadoop to not create a whole bunch
>> of Map tasks at once
>
On Fri, May 7, 2010 at 2:44 PM, Jonathan Ellis wrote:
> Sounds like you need to configure Hadoop to not create a whole bunch
> of Map tasks at once
interesting, from a quick check it seems there are a dozen threads running.
Yet , setNumMapTasks seems to be deprecated (together with JobConf)
and
On May 7, 2010, at 9:40, gabriele renzi wrote:
On Fri, May 7, 2010 at 2:53 PM, Matt Revelle
wrote:
re: not reporting, I thought this was not needed with the new mapred
api (Mapper class vs Mapper interface), plus I can see that the
mappers do work, report percentage and happily terminate
On Fri, May 7, 2010 at 2:53 PM, Matt Revelle wrote:
> There's also the mapred.task.timeout property that can be tweaked. But
> reporting is the correct way to fix timeouts during execution.
re: not reporting, I thought this was not needed with the new mapred
api (Mapper class vs Mapper interf
On Fri, May 7, 2010 at 3:02 PM, Joost Ouwerkerk wrote:
> Joseph, the stacktrace suggests that it's Thrift that's timing out,
> not the Task.
>
> Gabriele, I believe that your problem is caused by too much load on
> Cassandra. Get_range_slices is presently an expensive operation. I
> had some succ
The whole point is to parallelize to use the available capacity across
multiple machines. If you go past that point (fairly easy when you
have a single machine) then you're just contending for resources, not
making things faster.
On Fri, May 7, 2010 at 7:48 AM, Joost Ouwerkerk wrote:
> Huh? Isn'
Joseph, the stacktrace suggests that it's Thrift that's timing out,
not the Task.
Gabriele, I believe that your problem is caused by too much load on
Cassandra. Get_range_slices is presently an expensive operation. I
had some success in reducing (although, it turns out, not eliminating)
this prob
There's also the mapred.task.timeout property that can be tweaked. But
reporting is the correct way to fix timeouts during execution.
On May 7, 2010, at 8:49 AM, Joseph Stein wrote:
> The problem could be that you are crunching more data than will be
> completed within the interval expire setti
The problem could be that you are crunching more data than will be
completed within the interval expire setting.
In Hadoop you need to kind of tell the task tracker that you are still
doing stuff which is done by setting status or incrementing counter on
the Reporter object.
http://allthingshadoo
Huh? Isn't that the whole point of using Map/Reduce?
On Fri, May 7, 2010 at 8:44 AM, Jonathan Ellis wrote:
> Sounds like you need to configure Hadoop to not create a whole bunch
> of Map tasks at once
>
> On Fri, May 7, 2010 at 3:47 AM, gabriele renzi wrote:
>> Hi everyone,
>>
>> I am trying to
Sounds like you need to configure Hadoop to not create a whole bunch
of Map tasks at once
On Fri, May 7, 2010 at 3:47 AM, gabriele renzi wrote:
> Hi everyone,
>
> I am trying to develop a mapreduce job that does a simple
> selection+filter on the rows in our store.
> Of course it is mostly based
Hi everyone,
I am trying to develop a mapreduce job that does a simple
selection+filter on the rows in our store.
Of course it is mostly based on the WordCount example :)
Sadly, while it seems the app runs fine on a test keyspace with little
data, when run on a larger test index (but still on a
19 matches
Mail list logo