That means they are blocking for something to be added to the task queue
On Mon, May 17, 2010 at 9:42 AM, Joost Ouwerkerk wrote:
> At any given moment at least half of those threads are in the following
> state; what does it represent?
> Name: ROW-READ-STAGE:6
> State: WAITING on
> java.util.conc
At any given moment at least half of those threads are in the following
state; what does it represent?
Name: ROW-READ-STAGE:6
State: WAITING on
java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@fea6030
Total blocked: 44 Total waited: 479
Stack trace:
sun.misc.Unsafe.park(Nati
On Sun, May 16, 2010 at 2:52 PM, Joost Ouwerkerk wrote:
> Meanwhile. I'm still getting TimedOutException errors when mapping this
> 30-million row table, even when retrieving no data at all. It looks like it
> is related to disk activity on "hot" nodes (when the same cassandra node has
> to handl
Hadoop doesn't make any assumptions about how input source data is
distributed. It can't 'know' that the data for the first 30 splits emitted
by the InputFormat are all stored on the same cassandra node.
The new case with the patch is CASSANDRA-1096
Meanwhile. I'm still getting TimedOutException
Oh, very interesting. I assumed Hadoop would be smart enough to
load-balance the jobs it sends out. Guess not.
Can you submit a patch?
On Wed, May 12, 2010 at 12:32 PM, Joost Ouwerkerk wrote:
> I've been trying to improve the time it takes to map 30 million rows using a
> hadoop / cassandra cl
I've been trying to improve the time it takes to map 30 million rows using a
hadoop / cassandra cluster with 30 nodes. I discovered that since
CassandraInputFormat returns an ordered list of splits, when there are many
splits (e.g. hundreds or more) the load on cassandra is horribly unbalanced.
e