Hey Boris,
Could you post specifics? Last I checked, the commit log in Cassandra was
designed to be run on a dedicated disk and thus should be writing
sequentially. I wouldn't expect a significant speed boost unless you are
running the commit log on a shared disk.
When the number of SSTables corr
I don't know, I'd ask them directly. When I looked at it I was more
interested in the throughput and acid compliance aspects of it.
--
Thanks,
Charles Woerner
On Jun 9, 2010, at 2:56 PM, "Parsacala Jr, Nazario R. [Tech]" > wrote:
So what is the size limit for voltdb ..?
From: Charles W
I hope Cassandra is competitive with other solutions well before 50TB
of data. There is a middle ground where you might choose one or the
other. Just as there are areas where you might choose PostGres or
Cassandra.
They claim it will scale all the way up. Right now the likely
dealbreaker will be i
So what is the size limit for voltdb ..?
From: Charles Woerner / IMAP [mailto:charleswoer...@gmail.com]
Sent: Wednesday, June 09, 2010 5:55 PM
To: user@cassandra.apache.org
Subject: Re: http://voltdb.com/ ?
Totally agree that Cassandra and voltdb fulfill different needs. I would say
mysql clust
Totally agree that Cassandra and voltdb fulfill different needs. I
would say mysql cluster (ndbd) would be a more appropriate
competitor. 50tb? Yes and no - it's designed to integrate with a
system called vertica which does scale to that size, but as a stand
alone system I don't believe
As far as finding its competitors go; If you need acid compliance, Cassandra
isn't in the list. If you need 50TB of data, is VoltDB in the list?
On Wed, Jun 9, 2010 at 2:30 PM, Charles Woerner / IMAP <
charleswoer...@gmail.com> wrote:
> I would disagree with that assessment. My take is that Volt
I would disagree with that assessment. My take is that Voltdb is a
high throughput, fault tolerant transaction processing db as opposed
to a caching system or key value store. It's easy to get hung up on
the in-memory nature of it but I believe that it is both fault
tolerant through redun
> Why don't you run the benchmark contrib/stress.py to see what performance do
> you get ?
Didn't know about that one.
Will give it a try. Thanks.
Hi,
Regarding point c), you should ask your self, "what is good performance for
me ?". The read performance mainly depends on how fast your hard drives are
and how many rows you can maintain in cache. With such a small cluster, if
you want "good" read performance, you better have fast hard drive an
On Jun 9, 2010, at 9:47 PM, li wei wrote:
> Thanks a lot.
> We are set READ one, WRITE ANY. Is this better than QUORUM in performance.
Yes, but less consistency safe.
> Do you think the cassandra Cluster (with 2 or nodes) should be always
> faster than Single one node in the reality and theo
Short question: Do cassandra only *really* shine when running a cluster with
lots of nodes?
Same question in a lengthy version:
If what I want to obtain from my cassandra cluster is given as this:
a) protection against data loss if nodes disk-crash
b) good uptime, if servers become unavailable o
Thanks a lot.
We are set READ one, WRITE ANY. Is this better than QUORUM in performance.
Thanks.
Do you think the cassandra Cluster (with 2 or nodes) should be always faster
than Single one node in the reality and theory?
Or it depends?
Thanks again!
- Original Message
From: Per
C#
On Wed, Jun 9, 2010 at 2:34 PM, Ran Tavory wrote:
> Some languages have higher level clients that might help you. What language
> are you using?
>
> On Jun 9, 2010 9:01 PM, "Steven Haar" wrote:
>
> What is the best way to pass a Cassandra client as a parameter? If you pass
> it as a paramete
Okay. Cool actually. That clears up quite a bit for me :)
On Jun 9, 2010, at 9:26 PM, Jonathan Ellis wrote:
> right
>
> On Wed, Jun 9, 2010 at 11:29 AM, Per Olesen wrote:
>>
>> On Jun 9, 2010, at 1:00 PM, Ben Browning wrote:
>>
>>> There really aren't "seed nodes" in a Cassandra cluster. When
Some languages have higher level clients that might help you. What language
are you using?
On Jun 9, 2010 9:01 PM, "Steven Haar" wrote:
What is the best way to pass a Cassandra client as a parameter? If you pass
it as a parameter, do you also have to pass the transport in order to be
able to clo
> How to set "write and read with QUORUM"?
You set this through each thrift api call you are making through your java
code. See http://wiki.apache.org/cassandra/API
> They are run physically separate hw (But same since they are VM)
So they share disk.
I think this can have an influence. As I
right
On Wed, Jun 9, 2010 at 11:29 AM, Per Olesen wrote:
>
> On Jun 9, 2010, at 1:00 PM, Ben Browning wrote:
>
>> There really aren't "seed nodes" in a Cassandra cluster. When you
>> specify a seed in a node's configuration it's just a way to let it
>> know how to find the other nodes in the clus
Thanks a lot.
I set RF=2 at storage-conf.xml
How to set "write and read with QUORUM"?
They are run physically separate hw (But same since they are VM)
- Original Message
From: Per Olesen
To: "user@cassandra.apache.org"
Sent: Wed, June 9, 2010 2:20:43 PM
Subject: Re: Quick help
El 09/06/2010, a las 04:31, Peter Schuller escribió:
>> I've had a server crash
>
> As jbellis points out there may be hardware issues, but if, in
> particular, the crash in question was a power outage a very common
> problem is running on a system which does not honor write barriers.
> Was it a
Its proper competitors are stuff like redis and memcached.
On Fri, Jun 4, 2010 at 8:19 AM, Jones, Nick wrote:
> I saw a tweet about claiming far better performance to Cassandra. After
> following up, I found out it requires the entire DB to reside in memory
> across the nodes.
>
>
>
> *Nick Jon
On Jun 9, 2010, at 1:00 PM, Ben Browning wrote:
> There really aren't "seed nodes" in a Cassandra cluster. When you
> specify a seed in a node's configuration it's just a way to let it
> know how to find the other nodes in the cluster. A node functions the
> same whether it is another node's seed
Hi Wei,
> 1) I found this: the 2 node is slower (30%) than one node in both of write
> and select. Is this normal? (In theory, 2 nodes should be faster than one?).
> I monitroing the 2 nodes and found tehy are working and flush often (so, 2
> nodes works)
Which consistency level are you using
I have about a million rows (each row with 100 cols) of the form
domain/!date/!id (e.g. gwm.com/!20100430/!CFRA4500) So I am interested in
getting all the ids (all cols) for a particular domain/date (e.g.
"gwm.ml.com/!20100430/!A" "gwm.ml.com/!20100430/!D"). I am looping in chunks of
6000 rows
What is the best way to pass a Cassandra client as a parameter? If you pass
it as a parameter, do you also have to pass the transport in order to be
able to close the connection? Is there any way to open or close the
transport direclty from the client?
Essentailly what I want to do is pass a Cassa
If you are using random partitioner, and you want to do an EXPENSIVE row scan
... I found that I could iterate using start_key="" end_key="" for first call
... and then all other calls you'd provide the start_key="LAST_KEY" from
previous iteration. If you set count to 1000, then you'll get 1000
> I don't get what you're saying. If you want to loop over your entire range
> of keys, you can do it with a range query, and start and finish will both be
> "". Is there any scenario where you would want to do a range query where
> start and/or finish do not equal "", if you use random partitionin
I don't get what you're saying. If you want to loop over your entire range
of keys, you can do it with a range query, and start and finish will both be
"". Is there any scenario where you would want to do a range query where
start and/or finish do not equal "", if you use random partitioning?
2010
If you need to add a new seed node, you should autobootstrap it first,
then afterwards change it to a seed. The only point of being a seed
is for *other* nodes after all.
On Tue, Jun 8, 2010 at 11:57 PM, Per Olesen wrote:
> Hi,
>
> Just a quick question on seed nodes and auto bootstrap.
>
> Am I
Evan was just doing a ritual CYA and saying "this is new technology"
[about a year ago].
On Tue, Jun 8, 2010 at 10:45 PM, Hector Urroz wrote:
> Hi all,
> We're starting to prototype Cassandra for use in a production system and
> became concerned about data corruption after reading the excellent a
As promised on IIRC we also have collected some information as we are
seeing (probably) the same problem.
https://issues.apache.org/jira/browse/CASSANDRA-1177
On Wed, Jun 9, 2010 at 14:11, aaron morton wrote:
> May be related to CASSANDRA-1014
> https://issues.apache.org/jira/browse/CASSANDRA-10
Hi,
I'm using Cassandra to store some aggregated data in a structure like this:
KEY - product_id
SUPER COLUMN NAME - timestamp
and in the super column, I have a few columns with actual data.
I am using a scan operation to find the latest super column
(start=Long.MAX_VALUE, reversed=true, count
I feel that there is a significant bit of confusion here.
You CAN use start/finish when using get_range_slices with random partitioner.
But you can't make any assumptions about what key will be next in the range
which is the whole point of "random". If you do know a specific key that you
care
Hi, Experts,
I am using 2 nodes of Cassandra cluster. I am doing a loading test (one
conenct, try to insert and select many postes into a discussion tree).
I need your help:
1) I found this: the 2 node is slower (30%) than one node in both of write and
select. Is this normal? (In theory, 2 n
May be related to CASSANDRA-1014
https://issues.apache.org/jira/browse/CASSANDRA-1014
And this discussion
http://www.mail-archive.com/user@cassandra.apache.org/msg02923.html
Aaron
On 6 Jun 2010, at 08:14, Peter Schuller wrote:
>> INFO 17:54:18,567 GC for ParNew: 1522 ms, 69437384 reclaimed lea
To use start and finish parameters at all, you need to use OPP. Start and
finish parameters don't work if you don't use OPP, i.e. the result set won't
be: start =< resultSet < finish
2010/6/9 Ben Browning
> OPP stands for Order-Preserving Partitioner. For more information on
> partitioners, loo
There really aren't "seed nodes" in a Cassandra cluster. When you
specify a seed in a node's configuration it's just a way to let it
know how to find the other nodes in the cluster. A node functions the
same whether it is another node's seed or not. In other words, all of
the nodes in a cluster are
OPP stands for Order-Preserving Partitioner. For more information on
partitioners, look here:
http://wiki.apache.org/cassandra/StorageConfiguration#Partitioner
To do key range slices that use both start and finish parameters and
retrieve keys in-order, you need to use an ordered partitioner -
eit
Using SSD for a commitlog device can bost performance while using
cassandra in a batch mode for fsync operations. From my experience ta
write operation can be 10 times faster when usding SSD for commitlog.
On Mon, Jun 7, 2010 at 6:37 PM, Héctor Izquierdo wrote:
> Hi everyone.
>
> I wanted to know
On 9 June 2010 10:43, Sylvain Lebresne wrote:
> > However, as a final point of clarification, is there a particular reason
> > that insert does not raise an exception when trying to insert over an
> > existing key, or when the key points to a tombstone record ?
>
> Inserting over an existing key
> However, as a final point of clarification, is there a particular reason
> that insert does not raise an exception when trying to insert over an
> existing key, or when the key points to a tombstone record ?
Inserting over an existing key is an update of the record and in the Cassandra
data mode
For hard disk drives, the random IOPS numbers are primarily dependent on the
storage device's random seek time.
7200RPM SATA drives - ~90 IOPS[citation needed]
10kRPM Serial Attached SCSI drives - ~ 140 IOPS[citation needed]
15kRPM Serial Attached SCSI drives - ~180 IOPS[citation needed]
Intel X2
Hi Jools,
when using a consistency level other than ALL, there is no way Cassandra
can tell whether a given key currently exists in a cluster. There may be
several concurrent insert or delete operations for the key in progress which
just
do not yet have propagated to the node which tries to dete
Hi Martin,
Many thanks for the succinct, and clear response.
I've got some pointers to move me in the right direction, many thanks.
However, as a final point of clarification, is there a particular reason
that insert does not raise an exception when trying to insert over an
existing key, or when
Perfect.
It's exactly the information I needed.
Thx Jonathan.
Hi Jools,
what happens in Cassandra with your scenario is the following:
1) insert new record
-> the record is added to Cassandra's dataset (with the given timestamp)
2) delete record
-> a tombstone is added to the data set (with the timestamp of the deletion,
which should be larger
Hi,
I've been developing a system against cassandra over the last few weeks, and
I'd like to ask the community some advice on the best way to deal with
inserting new data where the key is currently a tombstone record.
As with all distributed systems, this is always a tricky thing to deal with,
so
> I've had a server crash
As jbellis points out there may be hardware issues, but if, in
particular, the crash in question was a power outage a very common
problem is running on a system which does not honor write barriers.
Was it a power outage?
--
/ Peter Schuller
In my opinion the #1 risk for corruption is user/client error with the
timestamps. Over time, Cassandra flushes data from memory to disks. After
it flushes to disk, Cassandra doesn't go back to delete or modify that data.
Because of this, deletes are performed by writing a "tombstone" to disk.
48 matches
Mail list logo