Re: Data Distribution / Replication

2010-08-13 Thread Stefan Kaufmann
> My recommendation is to leave Autobootstrap disabled, copy the > datafiles over, and then run cleanup.  It is faster and more reliable > than streaming, in my experience. I thought about copying da Data manually. However if I have a running environment and add a node (or replace a broken one), h

Re: Data Distribution / Replication

2010-08-13 Thread Benjamin Black
Number of bugs I've hit doing this with scp: 0 Number of bugs I've hit with streaming: 2 (and others found more) Also easier to monitor progress, manage bandwidth, etc. I just prefer using specialized tools that are really good at specific things. This is such a case. b On Fri, Aug 13, 2010 at

Re: Count rows

2010-08-13 Thread Jonathan Ellis
Well, it's a bad idea, except when it isn't. I think I'm okay with our api evolving to handle more corner cases. It's true that it runs the risk of encouraging bad design from new users though. On Fri, Aug 13, 2010 at 1:07 PM, Gary Dusbabek wrote: > Should we close https://issues.apache.org/jir

RE: error using get_range_slice with random partitioner

2010-08-13 Thread David McIntosh
Adam, I'm using my own code to iterate that is similar to what Dave Viner posted except in C#. Given that it works in 0.6.3 I'd like to think that the code is ok unless this type of iteration isn't supported. I was going to try iterating using tokens today but it turns out it's not so easy to ge

Re: Data Distribution / Replication

2010-08-13 Thread Bill de hÓra
On Fri, 2010-08-13 at 09:51 -0700, Benjamin Black wrote: > My recommendation is to leave Autobootstrap disabled, copy the > datafiles over, and then run cleanup. It is faster and more reliable > than streaming, in my experience. What is less reliable about streaming? Bill

[RELEASE] 0.7.0 beta1

2010-08-13 Thread Eric Evans
Happy Friday the 13th. Are you feeling lucky? I know I am. Ok, first off, a disclaimer. As the suffix on the version indicates this is *beta* software. If you run off and upgrade a production server with this there is a very good chance that you are going to be sad/fired/mocked/ridiculed/laug

Re: Count rows

2010-08-13 Thread Gary Dusbabek
Should we close https://issues.apache.org/jira/browse/CASSANDRA-653 then? Fetching a count of all rows is just a specific instance of fetching the count of a range or rows. I spoke to a programmer at the summit who was working on this ticket mainly as a way of getting familiar with the codebase.

Re: Cassandra and Pig

2010-08-13 Thread Stu Hood
> Still I get an exception which I cannot explain where it comes > from (http://pastebin.com/JYfSSfny) Which version of Cassandra are you using? The 0.6 series requires that a valid storage-conf.xml is distributed with the job to specify connection/partitioner/etc information, but trunk/0.7-beta2

Re: Cassandra and Pig

2010-08-13 Thread Stu Hood
Hmm, the example code there may not have been run in distributed mode recently, or perhaps Pig performs some magic to automatically register Jars containing classes directly referenced as UDFs. -Original Message- From: "Christian Decker" Sent: Friday, August 13, 2010 12:16pm To: user@ca

Re: Count rows

2010-08-13 Thread Mark
On 8/13/10 10:52 AM, Jonathan Ellis wrote: because it would work amazingly poorly w/ billions of rows. it's an antipattern. On Fri, Aug 13, 2010 at 10:50 AM, Mark wrote: On 8/13/10 10:44 AM, Jonathan Ellis wrote: not without fetching all of them with get_range_slices On Fri, Aug 1

Re: Count rows

2010-08-13 Thread Jonathan Ellis
because it would work amazingly poorly w/ billions of rows. it's an antipattern. On Fri, Aug 13, 2010 at 10:50 AM, Mark wrote: > On 8/13/10 10:44 AM, Jonathan Ellis wrote: >> >> not without fetching all of them with get_range_slices >> >> On Fri, Aug 13, 2010 at 10:37 AM, Mark  wrote: >> >>> >>>

Re: Count rows

2010-08-13 Thread Mark
On 8/13/10 10:44 AM, Jonathan Ellis wrote: not without fetching all of them with get_range_slices On Fri, Aug 13, 2010 at 10:37 AM, Mark wrote: Is there some way I can count the number of rows in a CF.. CLI, MBean? Gracias Im guessing you would advise against this? Any reaso

Re: Count rows

2010-08-13 Thread Jonathan Ellis
not without fetching all of them with get_range_slices On Fri, Aug 13, 2010 at 10:37 AM, Mark wrote: > Is there some way I can count the number of rows in a CF.. CLI, MBean? > > Gracias > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cass

Re: a plea not to remove rowsize warning

2010-08-13 Thread Jonathan Ellis
added key to in_memory_compaction_limit threshold log: logger.info(String.format("Compacting large row %s (%d bytes) incrementally", FBUtilities.bytesToHex(rows.get(0).getKey().key), rowSize)); On Wed, Aug 11, 2010 at 4:11 PM, Edward Capriolo wrote: > Hello all, > > I recently poste

Count rows

2010-08-13 Thread Mark
Is there some way I can count the number of rows in a CF.. CLI, MBean? Gracias

Re: Migration from .6 to.7

2010-08-13 Thread Jonathan Ellis
yes. NEWS.txt On Fri, Aug 13, 2010 at 10:31 AM, Claire Chang wrote: > I was wondering if there will be a document on how to do it? > > Sent from my iPhone > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.c

Migration from .6 to.7

2010-08-13 Thread Claire Chang
I was wondering if there will be a document on how to do it? Sent from my iPhone

Monitoring cassandra using Munin...how to get a hourly graph?

2010-08-13 Thread Simon Reavely
Hi, For those of you using Munin to monitor Cassandra's JMX stats I wondered if anyone had figured out how to get hourly graphs. By default we are getting daily, weekly, monthly, yearly but for our performance testing we really need hourly and looking on the message boards we can't figure out how

Re: Cassandra and Pig

2010-08-13 Thread Christian Decker
Wow, that was extremely quick, thanks Stu :-) I'm still a bit unclear on what the pig_cassandra script does. It sets some variables (PIG_CLASSPATH for one) and then starts the original pig binary but injects some libraries in it (libthrift and pig-core) but strangely not the cassandra loadfunc, why

Re: TimeUUID vs Epoch

2010-08-13 Thread Mark
hen using TimeUUID? For example I am storing a bunch of records keyed by the current date "20100813". Each column is a TimeUUID. If I wanted to get all the columns that between some arbitrary time.. say 6am - 9am I can get that? Using Long I can just use a start of "12817044

Re: TimeUUID vs Epoch

2010-08-13 Thread Sylvain Lebresne
nd range for querying across times. Can this be >>> accomplished using TimeUUID? >>> >>> Would someone also explain how TimeUUID is actually sorted? Im confused >>> on >>> how its actually compared. Thanks! >>> >>> >>>

RE: Cassandra and Pig

2010-08-13 Thread Stu Hood
That error is coming from the frontend: the jars must also be on the local classpath. Take a look at how contrib/pig/bin/pig_cassandra sets up $PIG_CLASSPATH. -Original Message- From: "Christian Decker" Sent: Friday, August 13, 2010 11:30am To: user@cassandra.apache.org Subject: Cassand

Re: TimeUUID vs Epoch

2010-08-13 Thread Mark
So long story short you can give a start/end range when using TimeUUID? For example I am storing a bunch of records keyed by the current date "20100813". Each column is a TimeUUID. If I wanted to get all the columns that between some arbitrary time.. say 6am - 9am I can get that? Usi

Re: Data Distribution / Replication

2010-08-13 Thread Benjamin Black
On Fri, Aug 13, 2010 at 9:48 AM, Oleg Anastasjev wrote: > Benjamin Black b3k.us> writes: > >> > 3. I waited for the data to replicate, which didn't happen. >> >> Correct, you need to run nodetool repair because the nodes were not >> present when the writes came in.  You can also use a higher >> c

Re: Data Distribution / Replication

2010-08-13 Thread Oleg Anastasjev
Benjamin Black b3k.us> writes: > > 3. I waited for the data to replicate, which didn't happen. > > Correct, you need to run nodetool repair because the nodes were not > present when the writes came in. You can also use a higher > consistency level to force read repair before returning data, whi

Key Index/Key Slices

2010-08-13 Thread Mark
Keys are indexed in Cassandra but are they ordered? If so, how? Do Key Slices work like Range Slices for columns.. ie I can give a start and end range? It seems like if they are not ordered (which I think is true) then performing KeyRanges would be somewhat inefficient or at least not as effic

Re: TimeUUID vs Epoch

2010-08-13 Thread Sylvain Lebresne
As long as time sorting is involved, you'll the same ordering if you use Epoch/Long or TimeUUID. The difference is between the ties. If when you insert two values at the exact same time, you want to have only one stay, then you want LongType. If however you don't want to merge such inserts, then yo

TimeUUID vs Epoch

2010-08-13 Thread Mark
I'm a little confused on when I should be using TimeUUID vs Epoch/Long when I want columns ordered by time. I know it sounds strange and the obvious choice should be TimeUUID but I'm not sure why that would be preferred over just using the Epoch stamp? The pretty much seem to accomplish the sa

Cassandra and Pig

2010-08-13 Thread Christian Decker
Hi all, I'm trying to get Pig to read data from a Cassandra cluster, which I thought trivial since Cassandra already provides me with the CassandraStorage class. Problem is that once I try executing a simple script like this: register /path/to/pig-0.7.0-core.jar;register /path/to/libthrift-r91713

Re: 0.7 CLI w/TSocket

2010-08-13 Thread Mark
On 8/13/10 7:09 AM, Jonathan Ellis wrote: if you turn off framed mode (by setting the the transport size to 0) then you need to use the unframed option with cli On Thu, Aug 12, 2010 at 10:20 PM, Mark wrote: On 8/12/10 9:14 PM, Jonathan Ellis wrote: Works fine here. bin/cassandra-cl

RE: error using get_range_slice with random partitioner

2010-08-13 Thread Adam Crain
David, This much like the behavior I saw... I thought that I might be doing something wrong, but I haven't had the time to check out other clients iteration implementations. What client are you using? -Adam -Original Message- From: David McIntosh [mailto:da...@radiotime.com] Sent: Thu

Re: 0.7 CLI w/TSocket

2010-08-13 Thread Jonathan Ellis
if you turn off framed mode (by setting the the transport size to 0) then you need to use the unframed option with cli On Thu, Aug 12, 2010 at 10:20 PM, Mark wrote: > On 8/12/10 9:14 PM, Jonathan Ellis wrote: >> >> Works fine here. >> >> bin/cassandra-cli --host localhost --port 9160 >> Connected

Re: How does cfstats calculate Row Size?

2010-08-13 Thread Julie
Jonathan Ellis gmail.com> writes: > > Right, row stats in 0.6 are just "what I've seen during the > compactions that happened to run since this node restarted last." > > 0.7 has persistent (and more fine-grained) statistics. > > > I'm guessing (haven't read this part of the source) that the m

Re: how to retrieve data from supercolumns by phpcassa ?

2010-08-13 Thread lisek
Ok I deal with it. There was a bug in phpcassa and now I can make it that way: get->('client', UUID::convert('2a3909c0-a612-11df-b27e-346336336631', UUID::FMT_STRING, UUID::FMT_BINARY)) If someone is using phpcassa and want to make it work, please give me a sign and I'll post the solution. --

Index feature in 0.7

2010-08-13 Thread Carlos Sanchez
All, I was wondering if I could get some information (link / pdf) about the new [column] indices in Cassandra for version 0.7 Thanks a lot, Carlos This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential informatio

Re: SV: how to retrieve data from supercolumns by phpcassa ?

2010-08-13 Thread lisek
Thanks Justus, I'll check it -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-retrieve-data-from-supercolumns-by-phpcassa-tp5416141p5419536.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.