Re: error using get_range_slice with random partitioner

2010-08-09 Thread Thomas Heller
w using the API incorrectly > 2) I am the only one encountering a bug > > My money is on 1) of course.  I can check the thrift API against what my > Scala client is calling under the hood. > > -Adam > > > -Original Message- > From: th.hel...@gmail.com on beha

Re: Columns limit

2010-08-07 Thread Thomas Heller
> Ok, I think the part I was missing was the concatenation of the key and > partition to do the look ups. Is this the preferred way of accomplishing > needs such as this? Are there alternatives ways? Depending on your needs you can concat the row key or use super columns. > How would one then "qu

Re: error using get_range_slice with random partitioner

2010-08-07 Thread Thomas Heller
On Sat, Aug 7, 2010 at 11:41 AM, Peter Schuller wrote: >> Remember the returned results are NOT sorted, so you whenever you are >> dropping the first by default, you might be dropping a good one. At >> least that would be my guess here. > > Sorry I may be forgetting something about this thread, bu

Re: Columns limit

2010-08-06 Thread Thomas Heller
> Thanks for the suggestion. > > I've somewhat understand all that, the point where my head begins to explode > is when I want to figure out something like > > Continuing with your example: "Over the last X amount of days give me all > the logs for remote_addr:XXX". > I'm guessing I would need to c

Re: Columns limit

2010-08-06 Thread Thomas Heller
Howdy, thought I jump in here. I did something similar, meaning I had lots of items coming in per day and wanted to somehow partition them to avoid running into the column limit (it was also logging related). Solution was pretty simple, log data is immutable, so no SuperColumn needed. ColumnFamil

Re: error using get_range_slice with random partitioner

2010-08-06 Thread Thomas Heller
On Sat, Aug 7, 2010 at 1:05 AM, Adam Crain wrote: > I took this approach... reject the first result of subsequent get_range_slice > requests. If you look back at output I posted (below) you'll notice that not > all of the 30 keys [key1...key30] get listed! The iteration dies and can't > proceed

Re: error using get_range_slice with random partitioner

2010-08-06 Thread Thomas Heller
> > Another way to do it is to filter results to exclude columns received > twice due to being on iteration end points. Well, depends on the size of your rows, keeping lists of 1mil+ column names will eventually become rally slow (at least in ruby). > > This is useful because it is not always

Re: error using get_range_slice with random partitioner

2010-08-06 Thread Thomas Heller
gt;>>                       $value = >>> $one_res->{'columns'}->[$i]->{'column'}->{'value'}; >>>                       if (!exists($returned_keys->{$next_start_key})) >>>                       { >>>              

Re: error using get_range_slice with random partitioner

2010-08-06 Thread Thomas Heller
Wild guess here, but are you using start_token/end_token here when you should be using start_key? Looks to me like you are trying end_token = ''. HTH, /thomas On Thursday, August 5, 2010, Adam Crain wrote: > Hi, > > I'm on 0.6.4. Previous tickets in the JIRA in searching the web indicated > tha

Re: Iterate all keys - doing it as the faq fails for me :(

2010-07-13 Thread Thomas Heller
I'm not entirely sure but I think you can only use get_range_slices with start_key/end_key on a cluster using OrderPreservingPartitioner. Dont know if that is intentional or buggy like Jonathan suggest but I saw the same "duplicates" behaviour when trying to iterate all rows using RP and start_key/

Re: Reading all rows in a column family in parallel

2010-07-08 Thread Thomas Heller
Hey, > Is > this possible in 0.6.0? (Note: for the next startToken, I was just planning > on computing the MD5 digest of the last key directly since I'm accessing > Cassandra through Thrift.) Can't speak for 0.6.0 but it works for 0.6.3. Just implemented this in ruby (minus the parallel par

Re: Thrift Client on Ruby, does it need compiled bindings? (or anything else to make it faster?)

2010-06-20 Thread Thomas Heller
Hey there, I saw the same thing and it worried me a little bit, but honestly its just ONE core of your CPU capping out. You could either add some threading to your insert script, or just spin up another process and see your insert rate nearly double. Although C might add some speed, I found that i

Re: Learning-by-doing (also announcing a new Ruby Client Codename: "Greek Architect")

2010-06-20 Thread Thomas Heller
Hey, of course I know about http://github.com/fauna/cassandra http://github.com/nzkoz/cassandra_object and they are awesome and without them I probably would have had a much harder time learning about Cassandra since its always good to have some way to actually throw some stuff in there. The ca

Learning-by-doing (also announcing a new Ruby Client Codename: "Greek Architect")

2010-06-18 Thread Thomas Heller
Howdy! So, last week I finally got around to playing with Cassandra. After a while I understood the basics. To test this assumption I started working on my own Client implementation since "Learning-by-doing" is what I do and existing Ruby Clients (which are awesome) already abstracted too much for

Re: Beginner Assumptions

2010-06-13 Thread Thomas Heller
Hey, I'm sorry, I think I didnt make myself clear enough. I'm using cassandra only the store the _results_ (the calculated time series) not the source data. Also using "Beginner Assumptions" as the Subject propably wasnt the best choice since I'm more interested in the inner workings of cassandra

Beginner Assumptions

2010-06-12 Thread Thomas Heller
Hey, I've been planning to play arround with Cassandra for quite some time and finally got arround to it. I like what I've seen/used so far alot but my SQL-brain keeps popping up and trying to convince me that SQL is fine. Anyways, I want to store some (alot of) Time Series data in Cassandra and