Re: timeout while running simple hadoop job

2010-05-12 Thread Johan Oskarsson
Looking over the code this is in fact an issue in 0.6. It's fixed in trunk/0.7. Connections will be reused and closed properly, see https://issues.apache.org/jira/browse/CASSANDRA-1017 for more details. We can either backport that patch or make at least close the connections properly in 0.6. Ca

Re: ColumnFamilyOutputFormat?

2010-05-03 Thread Johan Oskarsson
I wrote this CassandraOutputFormat last year. It is most likely not working against newer/current versions of Cassandra, but if you want something to work with it can be used as a starting point. http://github.com/johanoskarsson/cassandraoutputformat /Johan On 30 apr 2010, at 14.14, Utku Can T

Re: MapReduce, Timeouts and Range Batch Size

2010-04-23 Thread Johan Oskarsson
I have written some code to avoid thrift reconnection, it just keeps the connection open between get_range_slices calls. I can extract that and put it up but not until early next week. /Johan On 23 apr 2010, at 05.09, Jonathan Ellis wrote: > That would be an easy win, sure. > > On Thu, Apr 22

Re: Cassandra and hadoop?

2010-03-17 Thread Johan Oskarsson
Hi Matteo, * Hadoop MapReduce can talk to Cassandra and process the data just like other input formats does from HDFS. But I would not recommend seeing Cassandra as a first class replacement for HDFS, they are two very different beasts. It will most likely always be a lot faster to let MapRed