Differencing test results

2010-07-28 Thread Thorvaldsson Justus
I am testing on one node only right now and simply adding data and reading it. There is no real problem but I feel I get differencing results from the tests depending on something I would like to know what it is. What is happening between two "points in time" -->Here is a point with Slower result

SV: Consequences of Cassandra key NOT unique

2010-07-28 Thread Thorvaldsson Justus
You insert 500 rows with key "x" And 1000 rows with key "y" You make a query getting all rows. It will only show two rows, the ones with the latest timestamps. /Justus Från: Rana Aich [mailto:aichr...@gmail.com] Skickat: den 29 juli 2010 08:23 Till: user@cassandra.apache.org Ämne: Re: Consequences

Re: Consequences of Cassandra key NOT unique

2010-07-28 Thread Rana Aich
Thanks for your reply! I thought in that case a new row would be inserted with a new timestamp and cassandra will report the new row. But how this will affect my range query? On Wed, Jul 28, 2010 at 7:03 PM, Benjamin Black wrote: > If you write new data with a key that is already present, the ex

Re: Someone has overwritten the CassandraLimitations wiki page?

2010-07-28 Thread 豊月
this page is probably not spam this page is translated into Chinese. we should notice AndroidYou make "CassandraLimitations_CH" 2010/7/29 Ashwin Jayaprakash : > > I see spam on this page - > http://wiki.apache.org/cassandra/CassandraLimitations > > Look at this - > http://wiki.apache.org/cassandr

Re: NoServer Available

2010-07-28 Thread Daniel Bernstein
Thanks for the help Aaron. I sorted it out: the problem was that I was using the latest version of pycassa against cassandra 0.6.x. When I downloaded the 0.3.0 pycassa and used the previous api, all worked properly. Cheers, db On Wed, Jul 28, 2010 at 2:44 PM, Aaron Morton wrote: > Just check

Re: NoServer Available

2010-07-28 Thread Daniel Bernstein
Thanks for the help Aaron. I sorted it out: the problem was that I was using the latest version of pycassa against cassandra 0.6.x. When I downloaded the 0.3.0 pycassa and used the previous api, all worked properly. Cheers, db On Wed, Jul 28, 2010 at 2:44 PM, Aaron Morton wrote: > Just check

Index/Count/Order by syntax

2010-07-28 Thread Mark
I know there is no native support for "order by", "group by" etc but I was wondering how it could be accomplished with some custom indexes? For example, say I have a list of word counts like (notice 2 words have the same count): "cassandra" => 100 "foo" => 999 "bar" => 1 "

Re: any better way to retrieve data than using get_range_slices

2010-07-28 Thread Aaron Morton
If you want to process millions of rows at a time take a look at the Hadoop and Pig integration. Try the Cloudera distro of Hadoop CHD3 it includes Pig with it. Pig is a "SQL" like language for doing large scale data analysis that compiles down to Java that is run in Hadoop jobs. http://hadoop.apac

Re: Evaluating Cassandra for our use case

2010-07-28 Thread Aaron Morton
Have you considered Redis http://code.google.com/p/redis/? It may be more suited to the master-slave configuration you are after. - You can have a master to write to, then slave to a slave master, then your web heads run a local redis and slave from the slave master. - Backup at the master or the s

Re: Consequences of Cassandra key NOT unique

2010-07-28 Thread Benjamin Black
If you write new data with a key that is already present, the existing columns are overwritten or new columns are added. There is no way to cause a duplicate key to be inserted. On Wed, Jul 28, 2010 at 6:16 PM, Rana Aich wrote: > Hello, > I was wondering what may the pitfalls in Cassandra when t

any better way to retrieve data than using get_range_slices

2010-07-28 Thread Ken Matsumoto
Hi all, Are there any better way to retrieve data from Cassandra than using get_range_slices? Now I'm going to port some programs using MySQL to Cassandra. The program query is like below: "select * from Table_A where date > 1/1/2008 and date < 12/31/2009 and locationID = 1" The result of

Consequences of Cassandra key NOT unique

2010-07-28 Thread Rana Aich
Hello, I was wondering what may the pitfalls in Cassandra when the Key value is not UNIQUE? Will it affect the range query performance? Thanks and regards, raich

cassandra 0.6.1 read returns wrong data?

2010-07-28 Thread Jianing Hu
We recently migrated part of our MySQL database to a 3-node Cassandra cluster with a replication factor of 3. Couple of days ago we noticed that Cassandra sometimes returns the wrong data. Not corrupted data, but data for a different key than the one being asked for. This error appears to be random

Re: iterating over all rows keys gets duplicate key returns

2010-07-28 Thread Dave Viner
Just as a followup, here's what seems to be the resolution: 1. 0.6.4 should fix this problem. 2. Using OPP as the DHT should solve it as well. 3. Prior to 0.6.4, when using RandomPartitioner as the DHT, there's no good way to guarantee that you see *all* row keys for a column family. Strategies t

RE: Evaluating Cassandra for our use case

2010-07-28 Thread Daniel Kluesing
>Is it possible to configure Cassandra in such a way that a >node only every asks itself for the data, and if so what sort of >effect will that have on read performance? Check out the RingCache class which lets you make your clients smart enough to ask the right server. (Also, if all nodes have a

Evaluating Cassandra for our use case

2010-07-28 Thread Russ Brown
Hi, I'm currently looking at NoSQL solutions to replace a bespoke system that we currently have in place. Currently I think the best fit is Cassandra, but I would like to get some feedback from those who know it better before spending more time on it. Our current system is geared to allowing our

Re: importance of key cache vs row cache

2010-07-28 Thread Rob Coli
On 7/28/10 12:26 PM, YES Linux wrote: i was wondering what the trade offs were between the key cache and row cache? which is more important from a read? if you have a large row cache can your key cache be small? - The row cache is a superset of the key cache. If you have a row cache on a CF,

Re: iterating over all rows keys gets duplicate key returns

2010-07-28 Thread Rob Coli
On 7/28/10 2:43 PM, Dave Viner wrote: Hi all, I'm having a strange result in trying to iterate over all row keys for a particular column family. The iteration works, but I see the same row key returned multiple times during the iteration. I'm using cassandra 0.6.3, and I've put the code in use

Re: Cassandra vs MongoDB

2010-07-28 Thread Jeremy Hanna
> "As a result, we designed and built Flume... > (I wonder if this could deliver into Cassanda :) ) Yes - apparently it's pretty easy to do - I was thinking of doing it but haven't found the time yet. https://issues.cloudera.org//browse/FLUME-20 On Jul 28, 2010, at 4:30 PM, Aaron Morton wrote:

Re: iterating over all rows keys gets duplicate key returns

2010-07-28 Thread Jeremy Hanna
Yes, didn't know if you saw the reply in the channel. This bug has been fixed in the forthcoming 0.6.4 release. It was bug CASSANDRA-1042 - https://issues.apache.org/jira/browse/CASSANDRA-1042 (0.6.4 will be out really soon) On Jul 28, 2010, at 4:43 PM, Dave Viner wrote: > Hi all, > > I'm ha

Re: NoServer Available

2010-07-28 Thread Aaron Morton
Just checking the obvious, your connecting to the local host so is this code running on one of the machines with the cassandra installed ? Second, assuming your using the current git hub source, put a break point in connection.py at line 191 to see what the actual error is when it tries to connect.

iterating over all rows keys gets duplicate key returns

2010-07-28 Thread Dave Viner
Hi all, I'm having a strange result in trying to iterate over all row keys for a particular column family. The iteration works, but I see the same row key returned multiple times during the iteration. I'm using cassandra 0.6.3, and I've put the code in use at http://pastebin.com/zz5xJQ8f Using

Re: repair failed or stopped after 7-8 hours?

2010-07-28 Thread Aaron Morton
Did you start the repair on all nodes at once or one at a time ? Take a look at the streams on the nodes, using either nodetool -h localhost -p 8080 streams Or the JMX interface. Check if the numbers are changing. AaronOn 28 Jul, 2010,at 08:14 AM, Michael Andreasen wrote:I've started repair on 6 n

Re: Cassandra vs MongoDB

2010-07-28 Thread Aaron Morton
If you are looking to store web logs and then do ad hoc queries you might/should be using Hadoop (depending on how big your logs are) I agree, take a look at the Cloudera Hadopp 3 CDH3, they include an app called Flume for moving data..."As a result, we designed and built Flume. Flume is a distribu

importance of key cache vs row cache

2010-07-28 Thread YES Linux
i was wondering what the trade offs were between the key cache and row cache? which is more important from a read? if you have a large row cache can your key cache be small? here is some background to my questions: i have a data set that has alot of random access for rows using get slices from

? how does one specify a supercolumn range?

2010-07-28 Thread james anderson
if i have data of the form Keyspace1 -> Super2 -> icecream -> 20100701 -> :vanille 100 :chocolade 2 :riesling-sorbet 900 20100702 -> :vanille 100 :chocolade 200 :riesling-sorbet 100 cake 20100701 -> :cheescake 2 :linzer 100 :apfel 2 20100702 -> :cheescake

Re: cassandra summit, making videos?

2010-07-28 Thread Oleg Anastasjev
uncle mantis gmail.com> writes: > Why is everything always in California or Las Vegas? Because you can convince your employer to pay your vacation near ocean or slots, i believe ;-)

Re: Cassandra vs MongoDB

2010-07-28 Thread Joseph Stein
If you are looking to store web logs and then do ad hoc queries you might/should be using Hadoop (depending on how big your logs are) While MongoDB has MapReduce (built in) it is there to simulate SQL GROUP BY and not for large scale analytics by any means. MongoDB uses a global read/write lock p

Re: Upgrading to Cassanda 0.7 Thrift Erlang

2010-07-28 Thread J T
Hi, That fixed the problem! I added the Framed option and like magic things have started working again. Example: thrift_client:start_link("localhost", 9160, cassandra_thrift, [ { framed, true } ] ) JT. On Tue, Jul 27, 2010 at 10:04 PM, Jonathan Ellis wrote: > trunk is using framed thrift c

Re: Cassandra vs MongoDB

2010-07-28 Thread Benjamin Black
They have approximately nothing in common. And, no, Cassandra is definitely not dying off. On Tue, Jul 27, 2010 at 8:14 AM, Mark wrote: > Can someone quickly explain the differences between the two? Other than the > fact that MongoDB supports ad-hoc querying I don't know whats different. It > al