Thanks Jeremy,
It make sense to abstract out CFOF and CFRW (right now it's tightly
bounded to avro), so that one can plugin custom serializer (avro, thrift
and going forward I guess may be CQL). I will create a JIRA and submit
the patch with do the needful changes. Surely, I will ping you if I
One thing that could be done is the CFRW could be abstracted more so that it's
easier to extend and only the serialization mechanism is required to extend it.
That is, all of the core functionality relating to Cassandra would be in an
abstract class or something like that. Then the avro based
There certainly could be a thrift based record writer. However, (if I remember
correctly) to enable Hadoop output streaming, it was easier to go with Avro for
doing the records as the schema is included. There could also have been a
thrift version of the record writer, but it's simpler to just
Hi all,
As I was integrating Hadoop with Cassandra, I wanted to serialize
mutations, hence I used thrift mutations in M/R jobs.
During the course, I came to know that CFRW considers only Avro
mutations. Can someone please explain me why only avro transport is
entertained by CFRW. Why not, bo
On Tue, Jan 25, 2011 at 12:09 PM, Mick Semb Wever wrote:
> Well your key is a mutable Text object, so i can see some possibility
> depending on how hadoop uses these objects.
Yes, that's it exactly. We recently fixed a bug in the demo
word_count program for this. Now we do
ByteBuffer.wrap(Arrays
On Wed, 2011-01-26 at 12:13 +0100, Patrik Modesto wrote:
> BTW how to get current time in microseconds in Java?
I'm using HFactory.clock() (from hector).
> > As far as moving the clone(..) into ColumnFamilyRecordWriter.write(..)
> > won't this hurt performance?
>
> The size of the queue is comp
On Wed, Jan 26, 2011 at 08:58, Mck wrote:
>> You are correct that microseconds would be better but for the test it
>> doesn't matter that much.
>
> Have you tried. I'm very new to cassandra as well, and always uncertain
> as to what to expect...
IMHO it's matter of use-case. In my use-case there
> > is "d.timestamp = System.currentTimeMillis();" ok?
>
> You are correct that microseconds would be better but for the test it
> doesn't matter that much.
Have you tried. I'm very new to cassandra as well, and always uncertain
as to what to expect...
> ByteBuffer bbKey = ByteBufferUtil.clo
On Tue, Jan 25, 2011 at 19:09, Mick Semb Wever wrote:
> In fact i have another problem (trying to write an empty byte[], or
> something, as a key, which put one whole row out of whack, ((one row in
> 25 million...))).
>
> But i'm debugging along the same code.
>
> I don't quite understand how the
On Tue, 2011-01-25 at 14:16 +0100, Patrik Modesto wrote:
> The atttached file contains the working version with cloned key in
> reduce() method. My other aproache was:
>
> > context.write(ByteBuffer.wrap(key.getBytes(), 0, key.getLength()),
> > Collections.singletonList(getMutation(key)));
>
> Wh
Hi Mick,
attached is the very simple MR job, that deletes expired URL from my
test Cassandra DB. The keyspace looks like this:
Keyspace: Test:
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Factor: 2
Column Families:
ColumnFamily: Url2
Columns sort
On Tue, 2011-01-25 at 09:37 +0100, Patrik Modesto wrote:
> While developing really simple MR task, I've found that a
> combiantion of Hadoop optimalization and Cassandra
> ColumnFamilyRecordWriter queue creates wrong keys to send to
> batch_mutate().
I've seen similar beha
Hi,
I play with Cassandra 0.7.0 and Hadoop, developing simple MapReduce
tasks. While developing really simple MR task, I've found that a
combiantion of Hadoop optimalization and Cassandra
ColumnFamilyRecordWriter queue creates wrong keys to send to
batch_mutate(). The proble is in the reduce
13 matches
Mail list logo