Re: Feeding in specific Cassandra columns into Hadoop

2010-05-03 Thread Mark Schnitzius
> > You should test that getSlicePredicate(conf).equals(originalPredicate) > > That's it! The byte arrays are slightly different after setting it on the Hadoop config. Below is a simple test which demonstrates the bug -- it should print "true" but instead prints "false". Please let me know if a

Re: Feeding in specific Cassandra columns into Hadoop

2010-05-03 Thread Jonathan Ellis
We serialize the SlicePredicate as part of the Hadoop Configuration string. It's quite possible that either - one of your column names is exposing a bug in the Thrift json serializer - Hadoop is silently truncating large predicates You should test that getSlicePredicate(conf).equals(originalPr

Re: Feeding in specific Cassandra columns into Hadoop

2010-05-03 Thread Mark Schnitzius
If I take the exact same SlicePredicate that fails in the Hadoop example, and pass it in to a multiget_slice, the data is returned successfully. So it appears the problem does lie somewhere in the tie-in to Hadoop. I will try to create a maximally-trimmed-down example that's complete enough to ru

Re: Feeding in specific Cassandra columns into Hadoop

2010-05-03 Thread Jonathan Ellis
Can you reproduce outside the Hadoop environment, i.e. w/ Thrift code? On Mon, May 3, 2010 at 5:49 AM, Mark Schnitzius wrote: > Hi all...  I am trying to feed a specific list of Cassandra column names in > as input to a Hadoop process, but for some reason it only feeds in some of > the columns I

Feeding in specific Cassandra columns into Hadoop

2010-05-03 Thread Mark Schnitzius
Hi all... I am trying to feed a specific list of Cassandra column names in as input to a Hadoop process, but for some reason it only feeds in some of the columns I specify, not all. This is a short description of the problem - I'll see if anyone might have some insight before I dump a big load of