Re: Cassandra on flash storage

2010-06-09 Thread Jeff Hammerbacher
Hey Boris, Could you post specifics? Last I checked, the commit log in Cassandra was designed to be run on a dedicated disk and thus should be writing sequentially. I wouldn't expect a significant speed boost unless you are running the commit log on a shared disk. When the number of SSTables corr

Re: http://voltdb.com/ ?

2010-06-09 Thread Charles Woerner / IMAP
I don't know, I'd ask them directly. When I looked at it I was more interested in the throughput and acid compliance aspects of it. -- Thanks, Charles Woerner On Jun 9, 2010, at 2:56 PM, "Parsacala Jr, Nazario R. [Tech]" > wrote: So what is the size limit for voltdb ..? From: Charles W

Re: http://voltdb.com/ ?

2010-06-09 Thread Paul Prescod
I hope Cassandra is competitive with other solutions well before 50TB of data. There is a middle ground where you might choose one or the other. Just as there are areas where you might choose PostGres or Cassandra. They claim it will scale all the way up. Right now the likely dealbreaker will be i

RE: http://voltdb.com/ ?

2010-06-09 Thread Parsacala Jr, Nazario R. [Tech]
So what is the size limit for voltdb ..? From: Charles Woerner / IMAP [mailto:charleswoer...@gmail.com] Sent: Wednesday, June 09, 2010 5:55 PM To: user@cassandra.apache.org Subject: Re: http://voltdb.com/ ? Totally agree that Cassandra and voltdb fulfill different needs. I would say mysql clust

Re: http://voltdb.com/ ?

2010-06-09 Thread Charles Woerner / IMAP
Totally agree that Cassandra and voltdb fulfill different needs. I would say mysql cluster (ndbd) would be a more appropriate competitor. 50tb? Yes and no - it's designed to integrate with a system called vertica which does scale to that size, but as a stand alone system I don't believe

Re: http://voltdb.com/ ?

2010-06-09 Thread Ned Wolpert
As far as finding its competitors go; If you need acid compliance, Cassandra isn't in the list. If you need 50TB of data, is VoltDB in the list? On Wed, Jun 9, 2010 at 2:30 PM, Charles Woerner / IMAP < charleswoer...@gmail.com> wrote: > I would disagree with that assessment. My take is that Volt

Re: http://voltdb.com/ ?

2010-06-09 Thread Charles Woerner / IMAP
I would disagree with that assessment. My take is that Voltdb is a high throughput, fault tolerant transaction processing db as opposed to a caching system or key value store. It's easy to get hung up on the in-memory nature of it but I believe that it is both fault tolerant through redun

Re: Running a very small cluster

2010-06-09 Thread Per Olesen
> Why don't you run the benchmark contrib/stress.py to see what performance do > you get ? Didn't know about that one. Will give it a try. Thanks.

Re: Running a very small cluster

2010-06-09 Thread Jordan Pittier - Rezel
Hi, Regarding point c), you should ask your self, "what is good performance for me ?". The read performance mainly depends on how fast your hard drives are and how many rows you can maintain in cache. With such a small cluster, if you want "good" read performance, you better have fast hard drive an

Re: Quick help on Cassandra please: cluster access and performance

2010-06-09 Thread Per Olesen
On Jun 9, 2010, at 9:47 PM, li wei wrote: > Thanks a lot. > We are set READ one, WRITE ANY. Is this better than QUORUM in performance. Yes, but less consistency safe. > Do you think the cassandra Cluster (with 2 or nodes) should be always > faster than Single one node in the reality and theo

Running a very small cluster

2010-06-09 Thread Per Olesen
Short question: Do cassandra only *really* shine when running a cluster with lots of nodes? Same question in a lengthy version: If what I want to obtain from my cassandra cluster is given as this: a) protection against data loss if nodes disk-crash b) good uptime, if servers become unavailable o

Re: Quick help on Cassandra please: cluster access and performance

2010-06-09 Thread li wei
Thanks a lot. We are set READ one, WRITE ANY. Is this better than QUORUM in performance. Thanks. Do you think the cassandra Cluster (with 2 or nodes) should be always faster than Single one node in the reality and theory? Or it depends? Thanks again! - Original Message From: Per

Re: Passing client as parameter

2010-06-09 Thread Steven Haar
C# On Wed, Jun 9, 2010 at 2:34 PM, Ran Tavory wrote: > Some languages have higher level clients that might help you. What language > are you using? > > On Jun 9, 2010 9:01 PM, "Steven Haar" wrote: > > What is the best way to pass a Cassandra client as a parameter? If you pass > it as a paramete

Re: Seeds and AutoBoostrap

2010-06-09 Thread Per Olesen
Okay. Cool actually. That clears up quite a bit for me :) On Jun 9, 2010, at 9:26 PM, Jonathan Ellis wrote: > right > > On Wed, Jun 9, 2010 at 11:29 AM, Per Olesen wrote: >> >> On Jun 9, 2010, at 1:00 PM, Ben Browning wrote: >> >>> There really aren't "seed nodes" in a Cassandra cluster. When

Re: Passing client as parameter

2010-06-09 Thread Ran Tavory
Some languages have higher level clients that might help you. What language are you using? On Jun 9, 2010 9:01 PM, "Steven Haar" wrote: What is the best way to pass a Cassandra client as a parameter? If you pass it as a parameter, do you also have to pass the transport in order to be able to clo

Re: Quick help on Cassandra please: cluster access and performance

2010-06-09 Thread Per Olesen
> How to set "write and read with QUORUM"? You set this through each thrift api call you are making through your java code. See http://wiki.apache.org/cassandra/API > They are run physically separate hw (But same since they are VM) So they share disk. I think this can have an influence. As I

Re: Seeds and AutoBoostrap

2010-06-09 Thread Jonathan Ellis
right On Wed, Jun 9, 2010 at 11:29 AM, Per Olesen wrote: > > On Jun 9, 2010, at 1:00 PM, Ben Browning wrote: > >> There really aren't "seed nodes" in a Cassandra cluster. When you >> specify a seed in a node's configuration it's just a way to let it >> know how to find the other nodes in the clus

Re: Quick help on Cassandra please: cluster access and performance

2010-06-09 Thread li wei
Thanks a lot. I set RF=2 at storage-conf.xml How to set "write and read with QUORUM"? They are run physically separate hw (But same since they are VM) - Original Message From: Per Olesen To: "user@cassandra.apache.org" Sent: Wed, June 9, 2010 2:20:43 PM Subject: Re: Quick help

Re: Cassandra won't start after node crash

2010-06-09 Thread Lucas Di Pentima
El 09/06/2010, a las 04:31, Peter Schuller escribió: >> I've had a server crash > > As jbellis points out there may be hardware issues, but if, in > particular, the crash in question was a power outage a very common > problem is running on a system which does not honor write barriers. > Was it a

Re: http://voltdb.com/ ?

2010-06-09 Thread AJ Slater
Its proper competitors are stuff like redis and memcached. On Fri, Jun 4, 2010 at 8:19 AM, Jones, Nick wrote: > I saw a tweet about claiming far better performance to Cassandra. After > following up, I found out it requires the entire DB to reside in memory > across the nodes. > > > > *Nick Jon

Re: Seeds and AutoBoostrap

2010-06-09 Thread Per Olesen
On Jun 9, 2010, at 1:00 PM, Ben Browning wrote: > There really aren't "seed nodes" in a Cassandra cluster. When you > specify a seed in a node's configuration it's just a way to let it > know how to find the other nodes in the cluster. A node functions the > same whether it is another node's seed

Re: Quick help on Cassandra please: cluster access and performance

2010-06-09 Thread Per Olesen
Hi Wei, > 1) I found this: the 2 node is slower (30%) than one node in both of write > and select. Is this normal? (In theory, 2 nodes should be faster than one?). > I monitroing the 2 nodes and found tehy are working and flush often (so, 2 > nodes works) Which consistency level are you using

Range Slices timing question

2010-06-09 Thread Carlos Sanchez
I have about a million rows (each row with 100 cols) of the form domain/!date/!id (e.g. gwm.com/!20100430/!CFRA4500) So I am interested in getting all the ids (all cols) for a particular domain/date (e.g. "gwm.ml.com/!20100430/!A" "gwm.ml.com/!20100430/!D"). I am looping in chunks of 6000 rows

Passing client as parameter

2010-06-09 Thread Steven Haar
What is the best way to pass a Cassandra client as a parameter? If you pass it as a parameter, do you also have to pass the transport in order to be able to close the connection? Is there any way to open or close the transport direclty from the client? Essentailly what I want to do is pass a Cassa

Re: Range search on keys not working?

2010-06-09 Thread Philip Stanhope
If you are using random partitioner, and you want to do an EXPENSIVE row scan ... I found that I could iterate using start_key="" end_key="" for first call ... and then all other calls you'd provide the start_key="LAST_KEY" from previous iteration. If you set count to 1000, then you'll get 1000

Re: Range search on keys not working?

2010-06-09 Thread Sylvain Lebresne
> I don't get what you're saying. If you want to loop over your entire range > of keys, you can do it with a range query, and start and finish will both be > "". Is there any scenario where you would want to do a range query where > start and/or finish do not equal "", if you use random partitionin

Re: Range search on keys not working?

2010-06-09 Thread David Boxenhorn
I don't get what you're saying. If you want to loop over your entire range of keys, you can do it with a range query, and start and finish will both be "". Is there any scenario where you would want to do a range query where start and/or finish do not equal "", if you use random partitioning? 2010

Re: Seeds and AutoBoostrap

2010-06-09 Thread Jonathan Ellis
If you need to add a new seed node, you should autobootstrap it first, then afterwards change it to a seed. The only point of being a seed is for *other* nodes after all. On Tue, Jun 8, 2010 at 11:57 PM, Per Olesen wrote: > Hi, > > Just a quick question on seed nodes and auto bootstrap. > > Am I

Re: Data loss and corruption

2010-06-09 Thread Jonathan Ellis
Evan was just doing a ritual CYA and saying "this is new technology" [about a year ago]. On Tue, Jun 8, 2010 at 10:45 PM, Hector Urroz wrote: > Hi all, > We're starting to prototype Cassandra for use in a production system and > became concerned about data corruption after reading the excellent a

Re: Too many ParNew's

2010-06-09 Thread Torsten Curdt
As promised on IIRC we also have collected some information as we are seeing (probably) the same problem. https://issues.apache.org/jira/browse/CASSANDRA-1177 On Wed, Jun 9, 2010 at 14:11, aaron morton wrote: > May be related to CASSANDRA-1014 > https://issues.apache.org/jira/browse/CASSANDRA-10

scans stopped returning values for some keys

2010-06-09 Thread Pawel Dabrowski
Hi, I'm using Cassandra to store some aggregated data in a structure like this: KEY - product_id SUPER COLUMN NAME - timestamp and in the super column, I have a few columns with actual data. I am using a scan operation to find the latest super column (start=Long.MAX_VALUE, reversed=true, count

Re: Range search on keys not working?

2010-06-09 Thread Philip Stanhope
I feel that there is a significant bit of confusion here. You CAN use start/finish when using get_range_slices with random partitioner. But you can't make any assumptions about what key will be next in the range which is the whole point of "random". If you do know a specific key that you care

Quick help on Cassandra please: cluster access and performance

2010-06-09 Thread li wei
Hi, Experts,   I am using 2 nodes of  Cassandra cluster. I am doing a loading test (one conenct, try to insert and select many postes into a discussion tree).   I need your help: 1) I found this: the 2 node is slower (30%) than one node in both of write and select. Is this normal? (In theory, 2 n

Re: Too many ParNew's

2010-06-09 Thread aaron morton
May be related to CASSANDRA-1014 https://issues.apache.org/jira/browse/CASSANDRA-1014 And this discussion http://www.mail-archive.com/user@cassandra.apache.org/msg02923.html Aaron On 6 Jun 2010, at 08:14, Peter Schuller wrote: >> INFO 17:54:18,567 GC for ParNew: 1522 ms, 69437384 reclaimed lea

Re: Re: Range search on keys not working?

2010-06-09 Thread David Boxenhorn
To use start and finish parameters at all, you need to use OPP. Start and finish parameters don't work if you don't use OPP, i.e. the result set won't be: start =< resultSet < finish 2010/6/9 Ben Browning > OPP stands for Order-Preserving Partitioner. For more information on > partitioners, loo

Re: Seeds and AutoBoostrap

2010-06-09 Thread Ben Browning
There really aren't "seed nodes" in a Cassandra cluster. When you specify a seed in a node's configuration it's just a way to let it know how to find the other nodes in the cluster. A node functions the same whether it is another node's seed or not. In other words, all of the nodes in a cluster are

Re: Re: Range search on keys not working?

2010-06-09 Thread Ben Browning
OPP stands for Order-Preserving Partitioner. For more information on partitioners, look here: http://wiki.apache.org/cassandra/StorageConfiguration#Partitioner To do key range slices that use both start and finish parameters and retrieve keys in-order, you need to use an ordered partitioner - eit

Re: Cassandra on flash storage

2010-06-09 Thread Boris Shulman
Using SSD for a commitlog device can bost performance while using cassandra in a batch mode for fsync operations. From my experience ta write operation can be 10 times faster when usding SSD for commitlog. On Mon, Jun 7, 2010 at 6:37 PM, Héctor Izquierdo wrote: > Hi everyone. > > I wanted to know

Re: Inserting new data, where the key points to a tombstone record.

2010-06-09 Thread Jools
On 9 June 2010 10:43, Sylvain Lebresne wrote: > > However, as a final point of clarification, is there a particular reason > > that insert does not raise an exception when trying to insert over an > > existing key, or when the key points to a tombstone record ? > > Inserting over an existing key

Re: Inserting new data, where the key points to a tombstone record.

2010-06-09 Thread Sylvain Lebresne
> However, as a final point of clarification, is there a particular reason > that insert does not raise an exception when trying to insert over an > existing key, or when the key points to a tombstone record ? Inserting over an existing key is an update of the record and in the Cassandra data mode

Re: Cassandra on flash storage

2010-06-09 Thread Lu Ming
For hard disk drives, the random IOPS numbers are primarily dependent on the storage device's random seek time. 7200RPM SATA drives - ~90 IOPS[citation needed] 10kRPM Serial Attached SCSI drives - ~ 140 IOPS[citation needed] 15kRPM Serial Attached SCSI drives - ~180 IOPS[citation needed] Intel X2

RE: Inserting new data, where the key points to a tombstone record.

2010-06-09 Thread Dr . Martin Grabmüller
Hi Jools, when using a consistency level other than ALL, there is no way Cassandra can tell whether a given key currently exists in a cluster. There may be several concurrent insert or delete operations for the key in progress which just do not yet have propagated to the node which tries to dete

Re: Inserting new data, where the key points to a tombstone record.

2010-06-09 Thread Jools
Hi Martin, Many thanks for the succinct, and clear response. I've got some pointers to move me in the right direction, many thanks. However, as a final point of clarification, is there a particular reason that insert does not raise an exception when trying to insert over an existing key, or when

Re: Duplicate a node (replication).

2010-06-09 Thread xavier manach
Perfect.   It's exactly the information I needed. Thx Jonathan.

RE: Inserting new data, where the key points to a tombstone record.

2010-06-09 Thread Dr . Martin Grabmüller
Hi Jools, what happens in Cassandra with your scenario is the following: 1) insert new record -> the record is added to Cassandra's dataset (with the given timestamp) 2) delete record -> a tombstone is added to the data set (with the timestamp of the deletion, which should be larger

Inserting new data, where the key points to a tombstone record.

2010-06-09 Thread Jools
Hi, I've been developing a system against cassandra over the last few weeks, and I'd like to ask the community some advice on the best way to deal with inserting new data where the key is currently a tombstone record. As with all distributed systems, this is always a tricky thing to deal with, so

Re: Cassandra won't start after node crash

2010-06-09 Thread Peter Schuller
> I've had a server crash As jbellis points out there may be hardware issues, but if, in particular, the crash in question was a power outage a very common problem is running on a system which does not honor write barriers. Was it a power outage? -- / Peter Schuller

Re: Data loss and corruption

2010-06-09 Thread Ben Standefer
In my opinion the #1 risk for corruption is user/client error with the timestamps. Over time, Cassandra flushes data from memory to disks. After it flushes to disk, Cassandra doesn't go back to delete or modify that data. Because of this, deletes are performed by writing a "tombstone" to disk.